Analysis of Clocked Timing Elements for Dynamic Voltage Scaling Effects over Process Parameter Variation

Hoang Q. Dao
ACSEL Lab
University of California, Davis
1 Shields Avenue, Davis, CA 95616
+1-530-754-6827
hqdao@ece.ucdavis.edu

Kevin Nowka
IBM Austin Research Lab
11400 Burnet Road MS9460,
Austin, TX 78758
+1-512-838-3350
nowka@us.ibm.com

Vojin G. Oklobdzija
ACSEL Lab
University of California, Davis
1 Shields Avenue, Davis, CA 95616
+1-530-754-6827
vojin@ece.ucdavis.edu

ABSTRACT
In power-constrained systems, the power efficiency of latches and flip-flops is pivotal. Characteristics of three selected latches and FFs were analyzed for their behavior under voltage scaling and different process corners in a 0.18um CMOS technology. The relative performance amongst the latches/FFs was consistent across the different supply voltages. At low-voltage power-delay-product was degraded by about 25%. Energy-delay-product was approximately doubled at low-voltage – for all latches/FFs over all process corners. This result was smaller in comparison to the ideal voltage scaling characteristics mainly because the effects of velocity saturation were less severe at low voltage. All three designs suffered more due to process variation under low-voltage conditions.

Categories and Subject Descriptors
Digital circuit: clocked-timing elements

General Terms
Measurement, Performance, Reliability

Keywords
Clocked timing elements, voltage scaling, process variation

1. INTRODUCTION
Latches and flip-flops occupy a large portion of the area and consume substantial power in VLSI systems. With increasing frequencies, overhead due to the timing elements is becoming a larger fraction of the cycle time. Because of the demands for battery-powered applications, devices are being operated over larger supply ranges. With dynamic voltage scaling [4], power supply voltage can be changed as the performance demands on system changes.

It was our interest to analyze the effect of large supply voltage variation as well as the typical process and environmental variation in bulk CMOS circuits. In assessing the quality of designs for the low power, both the designs and the evaluation methodology and criteria are equally important. We specifically focus on latches and flip-flops: an improved semi-dynamic flip-flop (SDFF) [1], the sense-amplifier flip-flop (SAFF) [2] and the master-slave PowerPC latch [5]. Section 2 presents the effects of voltage scaling. Section 3 discusses modification to the test bench by Stojanovic and Oklobdzija [3]. Section 4 outlines the experiment. The choice of PMOS-to-NMOS ratio is presented in section 5. The simulated results are analyzed in section 6.

Figure 1. Measured power-delay vs. supply voltage for master-slave FFs

2. VOLTAGE SCALING
In voltage scaling, the supply voltage can be reduced when the performance demand on the VLSI system is low as shown in [4]. This results in increased energy efficiency. Thus, latches and flip-flops used in voltage scalable designs must tolerate a wide range of supply voltages. Figure 1 presents measured results of the effect of voltage scaling on the power and delay of four master-slave latch configurations with a variety of output loads. Approximately 500 measurements of the data-in to data-out delay and power consumption are shown.

From this figure, it is apparent that lowering the supply voltage improves power consumption at a rate much greater than the rate at which delay is increased. The aim of this analysis is to extend this to alternative latches and fairly assess their performance with voltage scaling.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republic, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

ACM 1-58113-371-5/01/0008/ $5.00.

56
3. MODIFIED TEST BENCH
The test bench presented by Stojanovic and Oklobdzija tests FF and latches under the worse loading. The most significance contribution of this work is the optimization process, the inclusion of power consumption in clock and data drivers into the total power consumption and Data-Q delay as a relevant measure. However, the excessive driving load has its negative effects. A large portion of the power consumed by the timing element is used to drive the load. Therefore, the total power consumption is overstated compared to what is expected in the average case. In addition, the loading of the clock and data input could make the transistors unnecessarily larger in optimization procedure.

In our modified test bench (shown in Figure 2), the drivers are only loaded with the test unit and the test unit is driving another stage of the four times larger size. In this case, the drivers are sensitive to test unit sizing and their sizes are adjusted to reflect the need of the test unit. The smaller size of the test unit is then favored. In addition, the drivers are sized with the electrical gain (or \( C_{out}/C_{in} \)) of 4. Such sizing sets a reasonable load on CMOS circuits in practice. (In actual sizing, the electrical gain of the driver is approximated by the width ratio – between the driven transistors of the test unit and the total width of the corresponding driver.)

![Figure 2. Modified test bench for latches/flip-flops](image)

A similar loading is applied to the test unit. The load consists of 4 identical copies of the test unit. In terms of data input, the electrical gain of the test unit is 4.

4. OPTIMUM PMOS/NMOS RATIO
The data and clock drivers are sized to achieve equally driving capability in both output rising and falling transition. It occurs when the signal rise time and fall time are equal. The corresponding optimal PMOS-to-NMOS (P/N) ratio of the drivers is found from the analysis of the 5-inverter ring oscillator. Each inverter in the ring has a load of 4 identical inverters to achieve the electrical gain of 4. The total width of the inverters is constant to maintain the same loading when the P/N ratio is changed. The simulation was run at 1.8V supply and 0.18um CMOS. Simulation result in Figure 6 shows that the optimal P/N ratio is 2.6 and is used to size the drivers.

![Figure 3. Modified semi-dynamic flip-flop, Nedovic & Oklobdzija [1]](image)

![Figure 4. Sense-amplifier flip-flop circuit, Nikolic & Oklobdzija [2]](image)

![Figure 5. PowerPC master-slave latch, Gerosa [5]](image)
5. RESULTS
The selected latch and flip-flops were simulated with the 0.18um CMOS technology, the clock period of 3ns and a 100% data activity. They were run at different process corners (typical, fast, slow) and at the high and low limits of voltage supply scaling, 1.8V and 1.0V, respectively. The Monte-Carlo simulation of 100 repetitions accounted for the process variation. The parameters of interest are clock-to-output delay, total power consumption, power-delay product (PDP) and energy-delay product (EDP).

5.1 Results of Voltage Scaling
Figures 7 shows the clock-to-output delay at setup time, over supply voltage and process variation. At high supply voltage, SDFF has the shortest delay; the larger delay of SAFF is due to the chosen single-input dual-output configuration. The master-slave PowerPC has slightly larger delay relative to SDFF. At low voltage supply, the nominal delay is increased by 125% in all test units, slightly smaller than the voltage reduction ratio.

However, as shown in Figure 8, the power is reduced to about 33% of their value at high supply voltage and so is its variation. This improvement is between the squared and cubic voltage reduction ratios. The overall effect on PDP is shown in Figure 9. The PDP is slightly better (~ 20%) at low voltage supply due to better improvement in power dissipation. In addition, the relative performance among the test units is preserved in all considered parameters. This indicates that the voltage supply can still be scaled down to 1.0V without failing the circuits.

5.2 Results of Device Variation
The effects of device size variation are also found in Figures 7–9. The 3-sigma value around the mean (represented by vertical bar) is shown. The process variation affects more at low voltage. It is most significant in delay at setup time, where variation is increased from 44 to 145% at low voltage supply (Figure 7). When the data is stable, the variation in power consumption is reduced in PowerPC latch and SDFF at 19% and 5% respectively, but increased in SAFF, at 8% (Figure 8). However, in term of PDP, the device variation effect increases: 100% in PowerPC latch, 78% in SDFF, and 57% in SAFF (Figure 9). This result means that the delay is affected more by device variation in low voltage than the power is.

5.3 Results of Process Corner Variation
The effect of process corners on the characteristics of the test units was analyzed. The measurement was made with stable data at the input of the test units (not at setup time). Figure 10 presents the results on the delay and power at typical, slow and fast process corners and at high- and low-voltage supplies. Beside similar relative performance among the test units as seen previously, the relative performance for each test unit at different process corners is preserved in both high- and low-voltage supplies. Notice that the PowerPC starts to fail at low voltage supply and at slow process corner.

Figure 6. Rise/fall time of the ring oscillator
Figure 7. Clock-to-output delay at setup time
Figure 8. Total power consumption at stable input
Figure 9. Power-delay product at stable input
boundary of the minority carrier saturation velocity. Due to smaller $V_{DS}$ at lower supply voltage, the effect of saturation velocity is less severe which results in higher driving capability of transistors, or better delay.

For completeness, the power consumption of test units for different data activities is shown in Figure 12. The performance of each test unit is similar to the previous result. The advantage of PowerPC in low data activity (less than 50%) matches to its circuit topology and retains with supply reduction.

6. CONCLUSION
The performance of selective latch/flip-flops was analyzed for the effects of voltage scaling. They were simulated at different process corners: process variation and supply voltages. It was found that lowering power supply voltage still preserves the functionality of the latch/FFs. Similar relative performance degradations due to the low supply were observed for each design. Relative performance was consistent amongst the latch/FFs across the power supply voltage range for different process corners. With transistors operating in velocity saturation, lower voltage reduces this effect and causes delay degradation at a slower rate. However, the effect of process variation became more severe at low power supply voltages, which has a significant negative impact to circuit performance.

7. ACKNOWLEDGEMENT
The author would like to thank other members of the IBM Austin Research Lab for their support, especially to Peter Hofstee for valuable discussion.

8. REFERENCES