Three Dimensional IC's and an application to High Speed Image Processor

Kenji Taniguchi

VLSI Research Center, Toshiba Corporation
Komukai, Toshiba-cho, Saiwai-ku, Kawasaki, 210 Japan

Abstract

Present state-of-the-art 3-D IC fabrication technologies are reviewed. Both short interconnection wiring inherent to 3-D structure and parallel processing architecture offer high performance for 3-D integrated circuit. An idea of a high speed image sensor composed of six layers is proposed.

1. Introduction

The technological development in the semiconductor industry is beginning to saturate because requirements on process systems for modern IC fabrication are becoming more and more difficult to meet as the minimum feature size is reduced below one micron.

In order to break through this constraint, a promising, innovative idea has been proposed in the form of the three-dimensional integrated circuit. Three-dimensional IC's are devices with great potential for not only realizing high packing density but also achieving extremely high system speed through the use of a parallel processing architecture and very short wiring interconnection.

In this paper, we first review the present state-of-the-art 3D IC fabrication processes, benefits, barriers to be overcome and design methodology. Then, in the last part, we propose an idea for a 3-D IC application as a high speed image processor.

II. 3-D IC Process Technology

Figure 1 shows a cross section of a three dimensional integrated circuit. In order to realize the 3-D IC structure, the following three key technologies are required.

1) SOI Technology

Fabrication of thin, single-crystal silicon films on an insulating substrate (silicon-on-insulator, or SOI) has been realized by liquid phase crystal growth and strip heater recrystallization(1). All technologies start with the deposition of a polysilicon film on top of an insulator. In the case of a seeded epitaxial growth, part of the underlying silicon is exposed so that it is in contact with the deposited polysilicon. A heat source is then scanned across the surface of the polysilicon such that the area under beam exposure becomes molten. As the heat source is scanned away, the molten silicon freezes.

2) 3-D Wiring Technology

The wiring technology is required to carry the signals vertically through the insulators between the active SOI device layers. The 3-D wiring is realized by selective chemical vapor deposition of refractory metal(2). Tungsten is conveniently prepared by the hydrogen reduction of tungsten hexafluoride gas, at temperatures around 600-700°C. The chemical vapor deposition of tungsten is
Fig. 2 Cross section of 3-D IC chip

attractive because this metal selectively deposits into holes cut in the oxide film. As shown in Figure 2, the 3-D wiring provides short interconnections thus promising high circuit speed and small interconnection area.

3) Planarization Technology

In order to fabricate 3-D devices, especially, in the SOI process, the underlying layer must be flat not to degrade subsequent fabrication processes. Planarization is achieved by depositing a SiO₂ film via rf-sputtering with substrate bias (3). The topology of the deposited SiO₂ film on the underlying SOI device layer tends to level off and gradually become smooth. It is possible to achieve perfect flatness. The basic mechanism of the planarization phenomenon is based upon the fact that both deposition and etching take place simultaneously.

III. 3-D IC system; benefits

3-D IC can be expected to provide performance improvements in several system aspects - high speed and low power.

1) Speed Improvement

In a large system composed of many IC chips, system speed is determined by propagation delay within the IC chip and chip-to-chip signal transit time. By integrating circuits in a three dimensional structure (decreasing the number of chips), the transit time between chips becomes negligible. This results in faster processing time for the total system.

Part of the improved performance is due to short distances that signals must travel between circuit nodes in the 3-D IC chip. A substantial reduction in the average interconnect path length(4) ensures very low parasitic capacitance to minimize propagation delay.

System speed improvement can also be achieved by using parallel processing architectures. The execution time of parallel processing is independent of M, the order of the matrix, while a serial processor would require a time proportional to M². Thus, the advantage of M² for 3-D parallel processor becomes the largest system speed improvement.

2) Low System Power Dissipation

Throughout capacity, a parameter describing the signal processing capacity of very-high speed IC's, is the product of the number of gates on a chip and the clock frequency. Future systems like a two dimensional image or signal processor require throughout capacity on the order of 10¹⁴. As for the system composed of many chips, power dissipation and weight of the system tends to be a big problem. For example, a system requiring 5x10¹⁴ gate-Hertz using a 1 μm IC technology could be expected to weigh more than 100 kg and dissipate over 30 watts (3). The importance of low power consumption and light weight is obvious in the design of battery-operated portable equipment.

The 3-D IC system can solve these problems. First, very low parasitic impedance due to short wiring permits the use of small, low-power devices to drive the data lines. This results in an enormous power saving over the systems composed of many chips. Second, 3-D densely packed CMOS/SOI technology allows very low power dissipation. Digital CMOS circuitry uses almost no power except when changing its states. Therefore, the power consumption of 3-D CMOS IC is proportional to the product of the number of active gates and the operation frequency. This point will be described in the next section.

3) Multi-Function

In 3-D IC's, each SOI layer in the stack may contain a complete set of functional elements such as various sensors, light-emitting diodes and semiconductor lasers made from different materials. For a particular application, the combination of those functional element layers produces a compact high performance system like a high speed smart sensor.

IV. 3-D IC systems: Barriers

Let us now turn to the barriers that must be overcome to realize 3-D IC systems.

1) Design Complexity

As the integration level grows, the circuit design is becoming very
difficult. Especially, for 3-D IC's, the third dimension (vertical direction) offers enormous layout design complexity.

One simple approach for solving the design complexity of 3-D IC's is the use of cellular structure on which small logic circuit is designed and replicated in an array (6). The resulting regular structures are easy to design and test, but they may use more chip area and slightly more power. Key concepts for the cellular approach are as follows: 1) Only one particular type of cells is implemented in the array and 2) data flow must be simple and regular, so that cells can be connected by a regular interconnections.

Another challenge to the 3-D IC systems is yield.

2) Low Yield

Because of the high level of integration in the 3-D IC, circuit defects may occasionally occur which greatly reduce chip yield. For example, in the case of defect density of 0.5/cm² in each SOI layer, final yield of five layered 3-D IC will be extremely low - less than 0.01% for an area of 20x20 mm².

The use of redundancy can greatly improve the final yield. The redundancy scheme employed in 3-D IC's is as follows. Spare elements should be placed at each block. These are composed of a group of identical functional elements which must have high initial yield. From a designer's point of view, the spare elements in a 3-D IC should be a complex circuit to reduce the amount of wiring. This avoids additional potential failures and offers easy routing of the vertical wiring must be done after testing all the functional elements on each SOI device layer prior to being processed the next layer.

3) Power Dissipation

In the preceding section, it is verified that the 3-D IC system dissipates lower power than the comparable system composed of many chips. However, the power dissipation in the 3-D IC system is still very large compared to a two-dimensional IC. On the other hand, from a view point of system performance, the parallel processing is mandatory. One solution to overcome this difficulty is to use bit-serial and word-parallel arithmetic which sacrifices system performance. In other words, while the logic of the individual processors operate simultaneously in a word-parallel fashion. The massive parallelism at the processor level still greatly improves the system speed. This bit-serial architecture greatly reduces the number of active gates and lower the power consumption.

V. Image Processor

A gray-scale visual sensing technology is expected to be developed. This technology will open up new image-processing applications like automatic inspection systems for detecting cracks, tears, and uneven color in vision systems for moving robots, enabling them to recognize objects, measuring the relative distance to the object, and avoiding obstacles in its path. A digital image processor aimed at these applications should be fast enough to recognize moving objects in real time.

In the following, based on the 3-D IC concepts described in the preceding sections, we propose an idea of a high speed image processor using 3-D IC's.

1) Feature Extraction Method

Image processing, in general, is accomplished in four consecutive steps: observation, quantization, feature extraction, and object recognition. The feature extraction stage consumes the most time since it must handle the huge digital data base that represents an image. The feature extraction operation (7) is typically done by linear combining the values of the surrounding pixels, most often multiplying each values by a weighting factor as shown in Figure 3.

![Fig.3 A spatial convolution of a 3-by-3 pixel kernel.](image_url)

An example of such operations is the Gaussian function, which acts as a low-pass spatial filter. The Gaussian operator of a 3-by-3-pixel kernel image is shown in Figure 3(a). The 9 adjacent pixels of the 3-by-3 region removed from the image data. The data word representing each pixel is multiplied by one of the weighting coefficients and the 9 products are finally summed to form a single output. The process, totaling 9 multiplications and 8 additions, must be represented for every pixel in the final image.
image. For 512-by-512 pixel image, that represents over two million multiplications and additions. Typical general purpose computers do those calculations sequentially and it takes a few seconds. In the 3-D image processor, those processes are performed in a parallel fashion requiring less than 100 μsec, which meets the demand of a very high speed image processor like a smart sensor for a cruise missile or a three dimensional visual sensor.

2) Basic Structure

Figure 4 shows the basic structure of the image signal processor consisting of an array of 512-by-512 pixels. The chip holds about 80 million transistors in an area of 20×20 mm². It is fabricated with a 0.7 μm CMOS process technology. The image processor consists of six different functional layers: optical sensor, A/D converter, memory, switch matrix, accumulator, and memory. Each layer in the stack contains a complete 512-by-512 array of one particular type of processing elements as well as spare elements for redundancy. All elements in the array are identical and operate simultaneously in a parallel fashion.

Data flow is vertical through the interconnect wiring between the SOI functional layers. This architecture allows the circuitry of individual elements to be extremely simple achieving the high packing density necessary for this application. The top layer requires a large area for peripheral circuitry because of the spacing needed between bonding pads and for the buffers surrounding the pads. In this image processor, an internal CPU, placed in the third through fifth layers controls, the data transfer in the circuit.

![Fig.4 Basic structure of the high speed image processor.](image)

![Fig.5 Layout in the top layer of 3-D image processor.](image)

3) Detailed Architecture

(i) Optical Sensor (1st layer)

The function of the top layer is to receive an optical image through lens. The sensor used in the array is a metal-semiconductor (Au-Si) diode providing high visible wavelength sensitivity. Generated carriers by incident light are transported across the reverse-biased Schottky diode and flow into an external load resistor to provide the output signal. The output signals are sent to the 2nd layer (A/D converter) through transfer gates in Figure 6 where φ is on.

It should be noted that each optical sensor has a different sensitivity. In order to compensate for this nonuniformity, the width of the resistor in each pixel is adjusted using a subsequent photolithography process after measuring the sensitivity of each optical sensor under uniform illumination.
(ii) A/D Converter (2nd layer)

The function of the second layer is to convert the analog output signal from the top layer into digital data. Serial analog-to-digital converters operate by the indirect method of converting a voltage to a time period measured by a counter (Fig. 7). A reference voltage is applied to one side of a comparator. The reference voltage is produced by integrating a very small constant voltage via an external integrator. When the reference voltage exceeds the input \( V_{in} \), the monitoring counter is stopped. From a practical point of view, 8-bit A/D converters are used in this processor because a conversion time is exponentially proportional to the digital word length. In order to minimize the propagation delay of the reference voltage from the integrator, the voltage integrator must be placed near each A/D converter. Thus, an integrator is placed every 32 pixels as shown in Figure 7.

![A/D Converter Basic Cell](image)

Fig. 7(a) A/D converter basic cell.

(iii) Memory (3rd layer)

Each cell in this layer has a 8-bit serial memory register composed of a quasistatic flip-flop as shown in Fig. 8(a). Gate capacitors are used to temporarily store the logic levels. The converted digital data of the optical signal is transferred from the 2nd layer in a serial fashion. Each data bit is shifted in the memory register from left to right with the alternating clock phases \( \phi_1 \) and \( \phi_2 \). The output of 8th register is connected to the input of 1st via transfer gate with allowing nondestructive readout.

![Arrangement of A/D Converter Basic Cells](image)

Fig. 7(b) Arrangement of the A/D converter basic cells.

(iv) Switch Matrix (4th layer)

The switch matrix layer performs lateral and vertical data transfer. As shown in Figure 9, each cell has nearest neighbor communication with other cells on the same plane or vertically to the other layers. The data transfer in the switch matrix is controlled by the internal CPU.

![Basic Cell of 8-bit Memory Register](image)

Fig. 8 Basic cell of 8-bit memory register.
VI. Performance Comparison

For the 3-D image parallel processing, the execution time is independent of M, the order of the matrix, while a serial machine would require a time proportional to M - an advantage of M for a 3-D IC technology. For M = 512, like this image processor, the serial machine would be required to perform its computations at 256,000 times the speed of the 3-D image processor to attain the same execution time. The operation frequency of 20 MHz in the circuit provides processing time of about 40 μsec; 13 μsec for A/D conversion and 20 μsec for image processing. The time required for data output is, however, about 400 μsec, twenty times longer than the processing time. This will be the largest barrier to realize ultra-high speed image processing.

On the other hand, a typical super-minicomputer requires a few seconds(7) for 3-by-3 convolution of a 512-by-512 pixel gray image - an advantage of 10,000 for a 3-D high speed image processor.

VII. Summary

Present state-of-the-art 3-D IC fabrication technologies are reviewed. The detailed investigation of 3-D IC's proved that the high performance of 3-D IC is achieved by the use of very short interconnect wiring and parallel processing architecture. We propose an idea of a high speed image processor whose system speed is about 10,000 times over a typical super-minicomputer.

Acknowledgement

The authors wish to thank Mr. M. Kashiwagi and Dr. H. Tago for their encouragement, Dr. H. Hidai, Dr. M. Sasaki, Mr. H. Tago and Mr. T. Yoshii for their helpful discussion.

This work was performed under the management of the R&D Association for Future Electron Devices as a part of R&D Project of Basic Technology for the Future Industries sponsored by Agency of Industrial Science and Technology, MITI.

References


