- HIGH-SPEED DIGITAL CIRCUITS DEVELOPMENT
Currently we are targeting high-speed digital circuits for 10GHz range
operation. Our work is supported by a Semiconductor Research Corporation (SRC) grant.
1.1 Energy-Delay Optimization:
Until recently design efforts were focused on speed while power was not
important. As a consequence, all the algorithms and optimization methods (tools included) were concerned with the circuit speed. We treat speed as a tradable quantity that can be traded for energy of operation and (in our ARITH-16 paper and Tutorial) we presented a method for estimation and development of arithmetic circuits in energy-delay space. We claim that any other comparison (that does not consider a design in energy-delay space) is meaningless. Further, we have developed methods for optimization of digital circuits so that the targeted speed is achieved at the minimal energy. Our work on optimal design in energy-delay space is continuing.
1.2 Clocked Storage Elements:
Our recent work has been dealing with clocked storage elements: latches and flip-flops for high-performance and low-power systems. We believe that the problems associated with clocking are going to be of increasing importance as the clock frequencies continue to scale. We have developed a standard benchmark for evaluation of timing elements and we have introduced relevant parameters for speed and power evaluation. The paper containing those results has been published in the April 1999 issue of the Journal of Solid-State Circuits. As a follow-up to this work we have developed several new configurations, most notably a modified SAFF which was the fastest Flip-Flop at the time. However, several of our newly developed structures have superior speed (Modified SDFF for example). For further details please see ACSEL publication list. In order to minimize the power we have developed several new Conditional and Double Edge Triggered Flip-Flops.
1.3 VLSI Arithmetic:
In the area of Fast VLSI Arithmetic, we have been investigating the relationship between the algorithms and technology, i.e. how efficiently does a particular algorithm map into the specifics of the technology applied. The knowledge of this relationship enables one to properly select and modify an algorithm so that it would result in the most efficient implementation. It has been shown that most of the algorithms developed in the past have not been efficient with the respect to the technology (technologies) currently used. (please see our MUX-based adder design)
In the past we showed an VLSI algorithm resulting in a fast and implementation power efficient scheme: Variable Block Adder (VBA). For optimization purposes, we applied linear programming techniques which resulted in a VLSI scheme for an adder that has a complexity of a Carry-Skip adder with the speed having a square root function dependency on the size of the adder (rather than linear). This scheme is substantially simpler than Carry Lookahead (CLA), though it is only slightly slower. We have also shown that the complex and complicated schemes, that require substantial hardware, might be just passing the point of diminishing returns. Our (VBA) scheme has been widely referenced and used in Intel 386 processor (see: P.Gelsinger, "Design and Test of the 80386", IEEE Design and Test, June 1987).
Currently we are revising the circuits to include wire delay, wire energy and transistor sizing.
Our past work produced a scheme for implementation of parallel multiplier with improved speed, TDM algorithm. We have shown that all the previously known schemes for the bit reduction in the multiplier tree do not result in an optimal solution. The problem has been identified to the way the cells are interconnected and we have subsequently developed two superior schemes. This work makes several well known and established schemes obsolete. Those two schemes are described in two papers in IEEE Transactions on VLSI and two papers in IEEE Transaction on Computers.
Another contribution to the fast parallel multiplier design has been in optimization of the final adder treated in the paper: "Design and Analysis of Fast Carry-Propagate Adder Under Non-Equal Input Signal Arrival Profile" first presented at the 28th Asilomar Conference on Signals, Systems and Computers in 1995. This work has been further developed (see for example ARITH-13 (1997) paper in ACSEL publication list).
- DESIGN FOR LOW-POWER
The objective of this effort is development of Low-Power Logic. In the past we worked on "Energy-Recovery / Adiabatic Logic" with an objective to develop a logic family (and the power system) which will operate with a minimal use of energy. We see the application of this logic is in wireless and portable system which are increasingly important today. We developed two logic families termed: CAL and PAL. (article "Clock-powered circuits set efficiency record" in Electronic Engineering Times, December 1st, 1997 and article in Electronic Design, August 4, 1997 and in Electronic Allert)
- HIGH-PERFORMANCE SYSTEM ARCHITECTURE AND ORGANIZATION
On the architecture side we have developed the concept of decoupled-multithreaded architecture. A paper: "Multithreaded Decoupled Architecture" describing this work was published in the International Journal of High-Speed Computing. The work on multithreaded architectures investigates the use of register windows, implemented in the SPARC architecture, to facilitate switching between different threads of execution.
In the past our group run simulations and evaluations on Intel iAPX860 super-computer. In collaboration with Physics department we developed a platform for simulation of molecular Dynamics (in particular) as well as other parallel simulation in an distributed computing environment (known as P4). P4 is a parallel computing platform consisting of a network of workstations. A computing task is broken into independent computational task which are sent to the other workstations on the network (usually the ones which are idle or under light load). We measured a performance of P4 (which we installed on our network of HP and DEC computers) and the results showed that P4 in such environment outperformed our Intel iAPX860 super-computer (a report can be obtained by sending e-mail to ACSEL).
3.2 Microarchitecture and System Organization
This work is focusing on micro-architecture for achieving high-performance (see: Super-Scalar implementations of RISC, ISSCC '97 tutorial).
We have been developing a concept of the processors implemented in the embedded logic and memory process, the impact on the architecture and organization, in particular. We made contribution to the Siemens Tri-Core uProcessor (one of the first of that kind which is utilizing IBM-Toshiba-Siemens advanced memory process). In the past we have analyzed an integrated cache memory consisting of SRAM-DRAM combination. We showed this to be very viable approach in designing a memory system. This work has been presented at the International Conference on Computer Design in October 1994. (see publication list)
Our other interest is in super-scalar architecture and organization, and generally in issues related to very high performance system design. We are interested in the pipeline organization, interaction between instruction architecture and pipeline definition, branch mechanism and the interrelation between architecture and implementation.
Currently we are expanding our research toward Multi-Media and DSP architectures with an intent to apply the knowledge gained on the general purpose computers to specialized architectures such as DSP. We studied the optimal instruction set for the Multi-Media Extensions ( MMX ). Our interest there is in efficient pipeline organization, definition of pipeline hierarchy, as well as definition and mapping of a carefully selected instruction set to the pipeline. (several publications on this subject are available in ACSEL publication list)
- DESIGN METHODOLOGY DESIGN FOR TESTABILITY AND RELIABLE DESIGN
Our interest in logic synthesis area has been in analysis of algorithms used, development of new logic families and their evaluation from the point of view of performance, testability and reliability for the production environment.
We have shown that logic synthesis performs less than ideal and that those tools should not be used "blindly". On several examples we have largely outperformed logic synthesis tool. As a result we have developed a design methodology: "A Guided Algorithmic Approach" and have shown how one can use the tools in more optimal way.
Currently we are developing a methodology for high-speed design that can utilize standard cells.