### **Digital System Clocking:** High-Performance and Low-Power Aspects

Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic

#### **Chapter 3: Timing and Energy Parameters**

- and Highlog system of
- tion of standard store
- speer, clock goting, and dopledge

OKLOBDZIJA is on IEEE Fallow and Distingui led bectures. He has and VLU performance, big is the holder of fourteen persons on aly do the subject Dr. Oklobdatus worked of the IBM I feature of Electrical and

M. STOLANOVIC is a PhD candidate in the VLD group. Ele Ford University, and a design engineer of RAMBUS Corp. He chan a Sambod University and the Dipl. log. degree in Destricted Engineer

ALL ALL ALL ADDRESS

DLA NEDOVIC is a revenuely stoll member of Fujim, America Estacotories working on speed digital circuits. He is connectly completing his PED on a member of the ACSEL group I University of Colifornia. He is an Electrical Engineering graduate have the University of

18N C-W71-27N47-

#### IEEE ILLE PRESS

WILEY-INTERSCIENCE



High-Performance and Low-Power Aspects

Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, and Nikola Nedovic



Wiley-Interscience and IEEE Press, January 2003

### Basic timing diagram in flip-flops



Definitions:

- Clock-to-Q Delay:  $t_{CQ}$  low-to high=  $t_{CQ,LH}$  high-to-low=  $t_{CQ,HL}$
- Setup time
- Hold time H

# Clk

Nov. 14, 2003

#### Setup and hold time behavior as a function of clock-to-output delay



Neither Setup nor Hold time are fixed constant parameters. They are function of Data-to-Clock time "distance".

Nov. 14, 2003

#### Setup time behavior as a function of data-to-output delay



When D-to-Q delay is observed, we can see that data can come closer to the triggering event than we thought. Data-to-Q is the RELEVANT parameter – NOT Clock-to-Q as many think !

#### D-Q and Clk-Q delay as a function of D-Clk offset



Determining the optimal setup  $U_{opt}$  and hold time  $H_{opt}$ .

Nov. 14, 2003

### Setup time, hold time, sampling window and clock width in a flip-flop



Digital System Clocking: Oklobdzija, Stojanovic, Markovic, Nedovic

Nov. 14, 2003

### Latch: setup and hold time



(a) early data D1 arrival; (b) late data D2 arrival

Nov. 14, 2003

#### Illustration of a data path



# Late Data Arrival and Time Borrowing in a pipelined design



Data-to-Output (Q) time window moves around the time axis; (becoming larger or smaller)

Nov. 14, 2003

### Early Data Arrival and Internal Race Imunity

- The maximum clock skew that system can tolerate is determined by the clock storage elements:
  - If Clk-Q delay of CSE is shorter than H, race can occur if there is no logic inbetween.
  - If Clk-Q is greater than H + possible skew no problem:

t<sub>CIk-Q</sub>> H+ t<sub>skew</sub>

• Internal race immunity:

 $R = t_{CIk-Q}-H$ 

Nov. 14, 2003

### I mpact of supply voltage on the sampling window



Sampling window determines the minimum required duration of data signal

Nov. 14, 2003

### **Energy Parameters**

# Components of Energy Consumption

- Switching Energy
- Short-Circuit Energy
- Leakage Energy
- Static Energy

# **Components of Energy Consumption**

$$E = \int_{t}^{t+T} V_{DD} \cdot i_{V_{DD}}(\tau) \cdot d\tau$$

Energy consumed by CSE during one clock period T, where *t* is chosen to include all relevant transitions: arrival of new data, clock pulse, and output transition.

This energy has four components:

$$E = E_{switching} + E_{short-circuit} + E_{leakage} + E_{static}$$

Switching Energy:

$$E_{switching} = \sum_{i=1}^{N} \alpha_{0-1}(i) \cdot C_i \cdot V_{swing}(i) \cdot V_{DD}$$

- *N* is the number of nodes
- *C*i is the capacitance of the node /
- *a*0-1(*i*) is the probability that a transition occurs at the
   node *i*
- *V*swing(*i*) is voltage swing of the node /

Nov. 14, 2003

### Short-circuit current in an inverter



#### (a) pull-up; (b) pull-down operation

Nov. 14, 2003



### Projected leakage currents

Leakage power will soon become a significant portion of the total power consumption in modern microprocessors



Assuming doubling of transistors / generation the leakage current will increase about 7.5 times corresponding to a 5 times increase in total leakage power

Nov. 14, 2003

# Where does the Energy go in CSE ?

- 1. Internal clocked nodes in storage elements
- 2. Internal non-clocked nodes in storage elements
- 3. Data and clock input load
- 4. Output load.

# Energy Breakdown

- 1. Internal clocking energy
- 2. Data and Clock Input Energy
- 3. Energy in Internal Non-clocked Nodes
- 4. Energy in Output Load
- 5. Energy per Transition
- 6. Glitching Energy

### *Energy Breakdown in Clocked-Storage Elements during one of the possible input data transitions*

|                         | <i>E</i> <sub><i>0-0</i></sub> | <i>E</i> <sub><i>0-1</i></sub> | E <sub>1-0</sub> | <i>E</i> <sub>1-1</sub> |
|-------------------------|--------------------------------|--------------------------------|------------------|-------------------------|
| <b>E</b> <sub>Clk</sub> | Y/N                            | Y                              | Y                | Y/N                     |
| <b>E</b> <sub>int</sub> | Y/N                            | Y                              | Y                | Y/N                     |
| <b>E</b> <sub>ext</sub> | Ν                              | Y/N                            | Y/N              | Ν                       |

Two cases:

- Storage elements without pre-charge nodes
- Storage elements with pre-charge nodes

Nov. 14, 2003

# CSE characterization and Test setup



Clock Energy

• Internal Energy:  $E_{int} = \frac{E_{0-1} + E_{1-0}}{2} - E_{Clk} - E_{Load}$ 

• Energy in Output Load

Nov. 14, 2003

# Energy per transition

- The energy-per-transition is the total energy consumed in a CSE during one clock cycle for a specified input data transition: *O-O*, *O-1*, *1-O*, or *1-1*
- This metric is crucial in that it yields significant insight about circuit energy
- By inspection of the node activity in a CSE for different input data transitions, the energy-per-transition can be utilized to obtain the energy breakdown between clocked nodes, internal nodes, and the external output load.
- This forms a good basis for the study of alternative circuit techniques that deal with internal clock gating.
- The energy breakdown information also offers valuable information about the tradeoffs associated with reduced clocking energy and the energy penalty incurred by the clock-gating logic, thus providing a better understanding of the optimization goals for the overall design

### Energy per transition



### $E_{average} = p_{0-0} \cdot E_{0-0} + p_{0-1} \cdot E_{0-1} + p_{1-0} \cdot E_{1-0} + p_{1-1} \cdot E_{1-1}$

### Glitching Energy in CSEs



- Glitches are generated by the unintended transitions propagating from the fan-in gates, termed *propagating glitches*.
- Glitches produced by non-glitch transitions at the inputs, called *generated glitches*

$$Clk^{E_{avg-glitch}} = \sum_{i=1}^{4} \beta_i \cdot E_{g_i}$$

Nov. 14, 2003

- We assumed that the data and clock inputs were supplied by drivers with sufficient drive strength.
- The input clock and data capacitances are important interface parameters for the clock network and logic design.
- The clock network designer and logic designer need to be aware of these capacitances in order to design circuits that drive storage elements.

Interface with Combinational Logic:

The relevant parameters to the combinational logic designer are:

- CSE input data slope
- Input data capacitance

The data slope affects performance and energy consumption of both driving logic and storage elements.

Clock and data slopes are generally not equal.

#### Interface with Clock Network:

- CSEs are affected by *clock skew* and *clock slope*.
- The total load of the clock distribution network is defined by the input capacitance of the clock node and number of CSEs on a chip.
- Increase in *clock slope* results in degradation of the CSE performance - the clock network designer has to know what slopes CSE can tolerate.
- This is especially important if Flip-Flops are used.
- The clock slope also affects energy consumption of the clock distribution network.
  - If larger clock drivers with smaller fanout are used, the clock edges are sharper and the storage element performance better, at the expense of an increase in energy consumption of the clock network.
  - Optimal tradeoff is achieved with minimal energy consumption that delivers the desired storage element performance.

Nov. 14, 2003

- To evaluate the *total clocking energy* per clock cycle in the entire clock subsystem, one needs to add the energy consumed in the clock distribution network.
- The energy consumed in the clock distribution network depends on the total switched capacitance which is determined by the total number of clocked storage elements on a chip and the input capacitance of their clock inputs, the total wiring capacitance, and the total switched capacitance of clock drivers as given by:

$$C_{distrib-net} = N_{FF} \cdot C_{in-Clk,FF} + C_{wire} + C_{sw-buff}$$

The last two terms depend on buffer insertion/placement strategy and should be minimized.