# A GaAs 32-bit Adder Andrew Beaumont-Smith Neil Burgess The University of Adelaide Department of Electrical and Electronic Engineering Centre for High Performance Integrated Technologies and Systems Adelaide SA 5005, Australia abeaumon@eleceng.adelaide.edu.au, neilb@eleceng.adelaide.edu.au #### **Abstract** This paper presents a new parallel GaAs 32-bit adder based on a combination of the Han-Carlson and Kowalczuk parallel adders. GaAs is particularly sensitive to loading so our aim was to reduce the wire lengths and the fanout of each gate. Our architecture achieves this by significantly reducing the number of cells in the carry tree while not significantly reducing its speed. The delay of the adder fabricated in 0.6µm MESFET GaAs technology was measured at 1.27ns with a power dissipation of 114mW at 0.9V. The area is 0.3mm² with a maximum density of 8000 transistors/mm². The figure of merit is 0.21µW/MHz.gate. #### 1. Introduction VLSI adders in both silicon and gallium arsenide (GaAs) technologies constitute critically important elements in most digital integrated circuits. The requirement of the adder is that it is primarily fast and secondarily efficient in terms of transistor usage, power consumption and chip area. GaAs MESFET technology does not permit dynamic logic techniques such as the Manchester carry chain, widely used to accelerate silicon VLSI adders. The main logic classes used are direct coupled FET logic (DCFL) and source-follower DCFL (SDCFL) in the construction of simple cells to build the carry trees of the adder. These logic classes provide the lowest power and smallest area implementations of static logic using GaAs MESFET technology. Sarmiento et al [1] examined a variety of adder architectures from the perspective of GaAs VLSI implementation and concluded the Brent-Kung adder [2] is well suited to GaAs owing to its low fan-out demands. However, the Brent-Kung adder has a long critical path so that it is not capable of very high speed addition. Very high performance is available using the Kogge-Stone adder [3] which uses more than twice as many cells as the Brent- Kung adder to achieve virtually half the addition delay. Besides its high hardware usage, the Kogge-Stone adder has the drawback of containing many long interconnecting wires. This paper presents the design and test of a 32-bit adder which uses a relatively small number of MESFETs to achieve 32-bit additions in 1.27ns with a power dissipation of 114mW at 0.9V and an area of 0.3mm<sup>2</sup>. The adder architecture is a combination of the Han-Carlson and the Kowalczuk-Tudor-Mlynek adders which are in turn both based on the Kogge-Stone adder. ### 2. Fast adder architectures A 32-bit Kogge-Stone adder is shown in Figure 1. It consists of 4 cell types with adder inputs a and b and the sum, s: • input: $$p_o = a_i \lor b_i$$ , $g_o = a_i \land b_i$ , $$h_i = a_i \oplus b_i$$ •black: $$g_o = g_i \lor (p_i \land g_{i-j}), p_O = p_i \land p_{i-j}$$ • grey: $$g_o = g_i \lor (p_i \land g_{i-j})$$ • output: $$s_i = a_i \oplus b_i \oplus g_i$$ The black cell corresponds to Brent and Kung's 'o' operator. Each carry bit is determined by a binary tree of black cells where the fan-out is limited to 2 cells. It has a regular structure, however many wires are half the adder width representing a significant output loading. The Han-Carlson carry tree [4] shown in Figure 2 reduces the number of cells in the Kogge-Stone adder for a small increase in delay by adding a final row of cells. It also halves the width of the adder and hence halves the length of the interconnects relative to the Kogge-Stone adder by employing a radix-4 scheme. The adders width is determined by the cell width and not the wiring pitch due to the relatively large transistor widths employed in the design. Hence the longest wires in the Han-Carlson adder are one quarter of the width of the Kogge-Stone adder. The Kowalczuk-Tudor-Mlynek [5] carry tree shown in Figure 3 replaces cells driving long wires by a series of rows with shorter wires which are also one quarter the width of the Kogge-Stone adder. Our new 32-bit adder with carry-in was constructed by merging the Han-Carlson and Kowalczuk-Tudor-Mlynek approaches as shown in Figure 4. The delay is one grey cell more than the Han-Carlson and Kowalczuk-Tudor-Mlynek adders. The delay can be made the same as the Han-Carlson and Kowalczuk-Tudor-Mlynek adders by adding 4 extra black cells at the high order end of the adder as shown in Figure 5. This new 32-bit adder has 33% less cells than the Kowalczuk-Tudor-Mlynek adder and shorter interconnects than the Han-Carlson adder for the same critical path length. | Adder | Kogge-<br>Stone | Han-<br>Carlson | Kowalc-<br>zuk et al | New<br>Adder | |--------------------|-----------------|-----------------|----------------------|--------------| | #cells | 129 | 80 | 113 | 75 | | Delay<br>(#cells) | 5 | 6 | 6 | 6 | | AxT | 645 | 480 | 678 | 450 | | A x T <sup>2</sup> | 3225 | 2880 | 4068 | 2700 | TABLE 1. Comparison of metrics for different 32-bit adder architectures. Table 1 compares the proposed adder with the other 32bit adder architectures. The new adder has superior areatime and area-time<sup>2</sup> characteristics than the other three; however it is still one delay slower than the Kogge-Stone adder. Although we have compared the delays through the adders on the basis of the number of cells in the critical path, this has not taken into account the extra delay due to long wires in the carry path. The pitch of the black and grey cells across the adders width is 30 microns, which according to simulations, corresponds to an extra delay of around 10ps per bit of carry propagation for an SDCFL gate. In the Kogge-Stone and the Kowalczuk-Tudor-Mlynek adders, the layout cannot be halved in width as it can with the Han-Carlson adder and our adder because there is one black cell per bit in the first few rows in those adders. Therefore, interconnects along the carry paths in these much larger adders are twice as long for the same number of bits traversed. For example, the signals from the fourth to fifth row of grey cells in the Kogge-Stone adder must traverse 16 column pitches whereas in the new adder they only need to traverse 4 column pitches, a difference of 120ps in delay. By subtracting the difference in delays between each row and taking into account the extra last stage in the new adder, the new adder is slower than the Kogge-Stone adder by around 50ps in this technology. ## 3. Adder design issues in GaAs GaAs MESFET technology places major restrictions on the available logic structures in DCFL and SDCFL logic classes: only inverters, two and three input NOR gates and OR gates (source follower buffers) can be used. DCFL is particularly sensitive to fan-out with an increase in gate delay of 60ps/fan-out. The gates in the four cell types were sized for high speed and low power operation but some nodes have high fan-out (more than 3) which require extra buffers. The logic design aimed to minimise the fan-out and the wire lengths, as discussed earlier. The longest wire lengths are across the width of the adder, so the layout was made more square to reduce the width. Referring to Figure 5, cells with buffered outputs are represented by concentric circles around the cells and extra line buffers (2xDCFL inverters, the second being double width) are indicated with triangles. These are strategically placed to reduce the fan-out and/or wire loading from a cells output to decrease the delay along the critical path. We can now draw schematics for the black and grey cells which are shown in Figure 6. Figure 6. Black cell (a) & (b) and grey cell (b). Input cells are needed to generate the "and", "or", and "ex-or" functions from the two input bits at each position in the adder. An output cell is the exclusive-or of the input bits along with the result of the carry tree. Figure 7 shows the schematic of the input and output cells. Figure 7. Input cell (a) and output cell (b). # 4. Implementation The 32-bit adder was implemented in the 0.6µm HGaAs-III technology with 4 layers of metal interconnect supplied by Vitesse Semiconductor [6]. Ring notation [7] was used to layout the cells to minimise the area and reduce noise coupling to signal wires. To increase the density of the layout, NOR gates have stacked enhancement FETs with merged source and drain connections which is somewhat different to previous layout styles. The completed layout of the adder is shown in Figure 8. The floorplan follows closely the schematic of Figure 5. The width of the adder was reduced by placing the logic for pairs of bits into a single supply rail pitch to make the layout more square and reduce the lengths of the long carry paths. The layout of the adder measures 0.5mm x 0.6mm to give an area of 0.3mm<sup>2</sup> with 1809 devices and 700 gates. This corresponds to an average device density of 6022 transistors/ mm<sup>2</sup> with the maximum density being 8000 transistors/ mm<sup>2</sup>. To facilitate functional testing, the two input operands are loaded into two 32-bit shift registers and the 32bit output is multiplexed to 8 output pads. To test the speed of the adder, carry-in is set to one, b<sub>0</sub>-b<sub>31</sub> are reset to zero and a0-a31 are set to one. An external control signal multiplexes an inverted value of the output of the critical path (s<sub>30</sub>) back to the input of the critical path (a<sub>1</sub>) causing s<sub>1</sub>s31 to oscillate. The frequency can be measured to determine the round trip delay and the adder delay through the critical path was found by simulation to be 74% of the round trip delay. The test chip measures 2mm x 2mm and is packaged in a 28 pin LDCC and has ECL compatible pads. ### 4.1. Design verification The adder layout was functionally verified with 4000 pseudo-random test vectors. The critical path through the adder was found by measuring the delay through each path in the carry tree with HSPICE to determine the longest delay. The simulated critical path measurements were 980ps from a<sub>1</sub> to s<sub>30</sub> using typical device parameters at 75 degrees C. The delay from a<sub>0</sub> to carry-out was 970ps. Figure 9 shows a simulation of the adder in oscillation mode where the period of oscillation is 2.65ns assuming typical process parameters at a temperature of 75 degrees C. The simulated power dissipation of the adder was 280mW at a supply voltage of 1.5V. #### 5. Testing The critical path was measured with the test chip operating in the critical path feedback mode. The circuit power supply was varied and the chip was found to operate down to a supply voltage of 0.8V. The smallest adder delay was 1.27ns at 0.9V. All four chips tested had the smallest delay in the range of 0.8-1.0V. The delay through the adder of the four test chips is plotted as a function of supply voltage in Figure 10. The measured power dissipation of the adder at a supply voltage of 1.5V was 282mW which is the same as the simulated power (280mW). The power dissipation for chip 4 was 87mW at a supply voltage of 0.8V. The power dissipation of the four test chips is plotted in Figure 11 as a function of supply voltage. Figure 10. Measured delay through the adder versus supply voltage for chips #1-4. Figure 11. Measured power dissipation of the adder versus supply voltage for chips #1-4. Figure 12 shows the power-delay/gate figure of merit for the adder. The figure of merit is $0.21\mu W/MHz$ .gate at a 0.9V power supply voltage. Functional Testing of the chip was done using a Tektronix DAS 9200 Digital Acquisition System which confirmed the simulated results. Figure 12. Power-Delay/Gate figure of merit versus supply voltage for adder chips #1-4. ### 6. Conclusion A new adder architecture based on a combination of the Han-Carlson and Kowalczuk-Tudor-Mlynek adders which is fast, small and suitable for GaAs implementation has been designed, fabricated and tested. The new 32-bit adder has the smallest area-time and area-time<sup>2</sup> metrics compared to the previous three adder architectures. Testing showed the adder delay to be 1.27ns at 0.9V (29% slower than predicted by simulations at 980ps) but at a substantially reduced power dissipation of 87mW when operated at a supply voltage of 0.8V. Figure 13 shows a micrograph of the fabricated chip. | Adder | ALU [8] | [9] | Ours | |-----------------------------------|----------------|---------------------|--------------------| | Technology | CMOS<br>0.25µm | CMOS<br>0.4µm | GaAs<br>0.6μm | | Logic | static | dynamic | static | | Delay (pre-<br>charge time) | 1.5ns | 1.4ns<br>(-) | 1.27ns | | Power 107mW (cycle time) (660MHz) | | 140mW<br>(100MHz) | 114mW<br>(580MHz) | | Area 0.6mm <sup>2</sup> | | 0.32mm <sup>2</sup> | 0.3mm <sup>2</sup> | TABLE 2. Comparison of 32-bit adders. Table 2 shows a comparison with a 32-bit ALU [8] and a 32-bit adder [9]. The figure of merit of our adder is 0.21µW/MHz.gate at 0.9V, which is comparable in power and speed to the ALU design [8]. The adder described in [9] would have a significantly higher power dissipation (around 700mW) at a 1.5ns cycle time compared to our adder. #### 6.1. Extension to 64-bit addition Based on the test results of the 32-bit adder, we estimate a 64-bit GaAs adder will have a delay of 1.5ns with a power dissipation of 255mW at 0.9V and an area of 0.7mm<sup>2</sup>. Recently published adders include a 0.93ns 0.5µm dynamic CMOS adder [10] and a two-stage carry look ahead adder [11] in both a 3.3V 0.5µm BiCMOS process and a 0.5µm CMOS process. The precharge time for [10] is 1.12ns and the power dissipation is 300mW at 200MHz [12] but the circuit also includes clocking circuits and I/O latches. Table 3 shows a comparison of these adder architectures. A GaAs adder based on this new architecture may have the smallest cycle time by a factor of 2-3 over static and only 25% faster cycle time than dynamic CMOS adders, but it utilises 2-3 times the area. The power dissipation is approximately the same as the other adders at a cycle time of 170MHz. | Adder | [11] | [11] | [10] | Ours | |-----------------------------|---------------------|--------------------|---------------------|--------------------| | Technology | вісмоѕ<br>0.5µm | CMOS<br>0.5µm | CMOS<br>0.5µm | GaAs<br>0.6μm | | Logic | static | static | dynamic | static | | Delay (pre-<br>charge time) | 3.5ns | 4.7ns | 0.93ns<br>(1.12ns) | 1.5ns | | Power<br>(cycle time) | 80mW<br>(50MHz) | 70mW<br>(50MHz) | 375mW<br>(250MHz) | 255mW<br>(500MHz) | | Area | 0.45mm <sup>2</sup> | 0.4mm <sup>2</sup> | 0.25mm <sup>2</sup> | 0.7mm <sup>2</sup> | TABLE 3. Comparison of 64-bit adders. ## References - [1] R.Sarmiento, P.P.Carballo and A.Nunez, "High-speed primitives of hardware accelerators for DSP in GaAs technology", *IEE Proceedings Part G*, vol. 139, pp. 205-216, April 1992. - [2] R.P.Brent and H.T.Kung, "A Regular Layout for Parallel Adders", *IEEE Transactions on Computers*, vol. 31, pp. 260-264, March 1982. - [3] P.M.Kogge and H.S.Stone, "A parallel algorithm for the efficient solution of a general class of recurrence relations", *IEEE Transactions on Computers*, vol. 22, pp. 786-793, August 1973. - [4] T.Han and D.A.Carlson, "Fast area-efficient VLSI adders", *Proc. 8th Symposium on Computer Arithmetic*, pp. 49-56, Como, September 1987. - [5] J.Kowalczuk, S.Tudor and D.Mlynek, "A new architecture for an automatic generation of fast pipeline adders", *Proc. ESSCIRC*, pp. 101-104, Milano, September 1991. - [6] Vitesse Semiconductor Inc., Foundry Design Manual V6. 1993. - [7] R.Sarmiento, K.Eshraghian and A.Nunez, "Speed-area-power optimisation for DCFL and SDCFL class of logic using ring notation", *Microprocessing and Microprogramming*, vol. 32, pp. 75-82, August 1991. - [8] M. Suzuki, et al., "A 1.5-ns 32-b CMOS ALU in Double Pass-Transistor Logic", *IEEE Journal of Solid State Circuits*, vol. 28, no. 11, pp. 1145-1151, November 1993. - [9] A. Inoue, et al., "A 0.4mm 1.4ns 32b dynamic adder using non-precharge multiplexers and reduced precharge voltage technique", *IEEE Symposium on VLSI Circuits Digest of Tech. Papers*, pp. 9-10, June 1995. - [10] S.Naffziger, "A Sub-nanosecond 0.5mm 64b Adder Design", ISSCC96 1996 *IEEE International Solid-State Circuits Conference*, pp. 362-363, February 1996. - [11] K.Ueda, H.Suzuki, K.Suda, H.Shinohara and K.Mashiko, "A 64-bit Carry Look Ahead Adder Using Pass Transistor BiCMOS Gates", *IEEE Journal of Solid State Circuits*, vol. 31, no. 6, pp. 810-818, June 1996. - [12] A.Beaumont-Smith, private communication with Samuel Naffziger, June 1997. Figure 1. Kogge-Stone adder. Figure 2. Han-Carlson adder carry tree. Figure 3. Kowalczuk-Tudor-Mlynek carry tree Figure 4. New combined Han-Carlson/Kowalczuk-Tudor-Mlynek carry tree. Figure 5. Modified adder to reduce delay (input and output cells are also shown). Figure 8. GaAs 32-bit adder layout Figure 9. SPICE simulation of the adder in oscillation mode. Figure 13. Micrograph of the fabricated GaAs 32-bit adder.