3D Architecture Implementation: A Survey
M. H. Jabbar, D. Houzet
GIPSA Lab, Grenoble INP, France
Research in 3D integration has been attracted researchers from industries as well as academics due to its superior benefits over 2D architecture such as better performance, lower power consumption, small form factor and support for heterogeneous technology integration. Depth understanding about 2D and 3D architecture is very important before real 3D design is taking place. In this paper, we discuss the research works on 3D integration particularly its benefits when comparing with CMOS scaling going to sub-nanometer process technology. We also describe several 3D architecture implementation previously developed to justify the need of our 3D experimental implementation which is currently being developed based on a long collaboration between ENSTA and GIPSA-Lab on multimedia MPSoC design.
According to the International Technology Roadmap Semiconductor (ITRS) the number of processing elements is expected to increase more than 100 processors . Additionally, the memory size is also projected to increase dramatically in the future along with the increasing number of processing elements. New concept of electronic design has been introduced a few years back which is 3D integration. This technology enable building circuits in 3 dimensional (3D) structures by stacking the wafers or dies in several layers using TSV for inter tier connection, as oppose to traditional 3D stacking method using wire bonding. This new technology offers a few advantages which could increase the device density allowing complex design implementation and significantly improvement performance. The aim of this paper is to provide general introduction of 3D integration technology and discuss the choice between CMOS scaling and 3D stacking. We also present several implementations of 3D architecture reported previously and discuss their purposes.
II. TSV TECHNOLOGY
TSV is a method that uses via across different layers of active silicon. Material uses for TSV is Copper Tungsten (W)   , Copper (Cu)   and Poly-Silicon (Poly-Si) . Poly-Si material is stable and has less effect on device characteristic than other materials. However, Cu or W is more suitable for the TSV due to lower resistance. Cu is most commonly used because it has good thermal conductivity compared to W and Poly-Si. W has longer delay compared with Cu TSV for any diameter size .
TSV allows high interconnection density between stacked chips. For example 120,000 interconnections for 12.5 mm2 area of 3D chip containing processor and memory . Another reported work achieve 103 interconnections for W TSV with 10 μm TSV pitch in the area of 1 mm2 . Another important thing is TSV lining or TSV insulation to insulate from the Silicon substrate. Most commonly used material is Silicon Oxide which can be deposited using Chemical Vapor Deposition (CVD) or Atomic Layer Deposition (ALD).
III. CMOS SCALING VS 3D INTEGRATION
As for CMOS transistor scaling, several critical issues exists as follow:
- Growing fabrication cost: non-recurring engineering (NRE) costs and lithography cost is increasing towards smaller feature size.
- Significant effect of process variation: moving towards smaller transistor size, process and parameter variation is worsening. Various new techniques are needed for mitigating the effect of process variation for 45 nm technology node compared with the previous technology . For example, among scaling challenges beyond 32 nm technology are :
- Increased off-state current from degraded drain-induced barrier lowering drain induce leakage current (DIBL) and subthreshold slope (SS) by poorer short channel effects significantly limits the effective gate length shorten than approximately 15 nm.
- decreasing oxide thickness, tox provides better channel control but with the penalty of increased gate leakage current and increased channel doping, eventually decreased mobility and increases random dopant fluctuations (RDF) and degrading minimum operating voltage.
- Interconnect wire delay become more significant than gate delay and consequently increase global delay. Therefore, performance improvement is slowly increased. It thus increases power consumption .
In 3D integration, the long interconnect wire length is reduced to square root of the length in 3D integration due to the stacking. This improves the speed where it reduces the RC delay of the interconnect wire and also reduce the number of buffers along the interconnect wire. For example reduction on average in total wire length is more than 28% when stacking two to five wafers and from 31% reduction for the longest wire for International Symposium of Physical Design (ISPD98) circuit benchmarks . As the interconnect wire length is reduced, its capacitance is reduced, the number of repeaters along the interconnect wire is hence reduced and eventually power consumption is decreased as well. 3D integration also supports integration of heterogeneous technology such as digital, analog, RF and MEMS technology where they can be process according to their process and then stack with other technology. Finally 3D integration introduces small form factor which is very suitable for mobile devices.
IV. 3D INTEGRATION ISSUES AND CHALLENGES
The importance effect of stacking in 3D structure is increased peak temperature   . The temperature in the chip can reach more than 100ºC. Temperature variation between dies can be around 10ºC for two stacked dies . Hotspot in the 3D chip can be up to more than 100ºC while temperature difference between stacks can be 1-20 ºC . Two things are very important as a result from this high temperature which is temperature variation and hotspot which affect the reliability of the chip are mean time to failure ratio (MTTR) and time to breakdown (TTBD).
Several methods have been proposed for thermal management techniques to solve thermal problem in 3D integration such as thermal herding which place the most frequently switch blocks near to the heat sink , using thermal vias to transfer heat out of the chip    and thermal aware design that focusing physical design stage such as floorplan and placement   . Thermal management techniques using dynamic frequency scaling (DFS) proposed that dies near heat sink can be assigned using higher frequency (eventually higher temperature) while workload that has strong thermal influence is assigned to the die that has stronger cooling efficiencies  .
Thermal stress is another effect of thermal problem when integrating using TSV. This is due to the different CTE property of Silicon, Cu, Silicon Dioxide and W. The CTE of Cu is larger than W when compare with Silicon which means that Cu TSV has stronger stress impact on Silicon. However, W has lower thermal conductivity than Cu. Thermal stress cause timing variation around ±10% for an individual cell . Thermal induced stress in 3D integration causes crack at the interface of TSV and Silicon substrate and between Cu interconnects and low-k insulator . This effect is strongly influent on device reliability. Cu TSV produces high thermal stress up to 750 MPa.
Additionally, there are many challenges for testing 3D architecture such as test architecture, test access mechanism, test scheduling, test pattern, testing under thermal and power constraint which is important especially for testing at run time. New defects create during 3D integration process introduced new type of defects such as in TSV or bonding structure which require distinctive testing techniques. Testing for 3D architecture is a great challenge because functional units of processors at micro architectural level can be partitioned at more than one layer. Testing is difficult because each layer does not have a complete functional system and thus require new testing strategy. Furthermore, pre-bond and post-bond testing is also vital to ensure only known good die (KGD) is integrated in the 3D architecture and TSV formation as well as bonding structure do not have defects .
V. 3D ARCHITECTURE IMPLEMENTATION
We discuss several 3D chips that have been taped out for different purposes over the last few years. There are other 3D chip have been fabricated without using TSV such as  and it is not discussed here.
In , they designed 64 cores using two tiers Tezzaron technology and Global Foundaries 130 nm standard cells. The Tezzaron technology uses via first method with face to face bonding wafer level stacking. They created custom VLIW in-order processors in five stages pipeline architecture to have efficient power efficient inter core communication by removing large and complex data structure. The project demonstrated large memory bandwidth of 3D stacking architecture which is up to 63 Gb/s. Inter core communication is achieved using 4 buffers architecture in each core to their neighbouring cores. Global barrier was used for synchronization for cores. The design can be run at 277 MHz. The design has been tested with several parallel benchmarks proving the correct functionality. Each processor core has 1.5 KB instruction memory and 4 KB data memory. TSV architecture has 1.2 um diameter, 5 um pitch, 6 um depth, tungsten TSV. Microbumps architecture has 3.4 um diameter and 5 um pitch. TSV is used for chip I/O interface and tier to tier connection is using microbumps. Each tier has 5 mm x 5 mm silicon area. A custom architecture is created modified from JTAG IEEE 1149.1 for off chip interface which are test control state machine, and by using four pair of tdi and tdo for each 4 blocks, 16 cores per block.
In , they successfully demonstrated 3D mesh NoC in 3 x 3 x 3 configuration using via last method from MIT Lincoln Lab 180 nm technology FDSOI process with 1.5 V. The 3D NoC is 2 mm x 2 mm per tier. The MIT Lincoln Lab has 3 metal layers for each tier, with a metal layer between two top tiers and a metal layer on top of the entire stack. Its TSV architecture has 2.5 um x 2.5 um with 3.9 um pitch. The two bottom tiers are bonded face to face and the third tier is connected using face to back. The NoC used XYZ routing algorithm. Each router port has 2 unidirectional links with 16 bit links. There is a functional unit connected to each router designed using linear feedback shift register (LFSR). The design was routed with 145 MHz with the power consumption of 120.5 mW. The goal of the test chip is to validate the high level system simulator for 3D NoC they are working on. The router used adaptive xyz routing algorithm. The node is designed as simple as possible so that large network can be implemented. The router has no memory buffer and therefore each flit takes one cycle to travel across each router.
Another 3D implementation is 3D FFT processor of 1024-point memory on logic for synthetic aperture radar (SAR) using MITLL 180 nm FDSOI technology . The FFT is radix-2 Cooley-Tukey FFT. The chip demonstrated that 3D architecture 53% decrease in average wire length, 24% increase in maximum operating frequency and 25.3% reduce in the total silicon area. The 3D die area is 23.40 mm2, 4.8 mm x 4.8 mm. The design run at 79.4 MHz, 12.6 ns with 409.2 mW power consumption at that speed. They used block level partitioning, where processing elements and memory is placed in the three tiers such that memories is close the processing elements.
In , they implemented two tiers logic of 2.5 mm x 5 mm with a three layer 8-channel 3D DRAM stacked on top using Tezzaron 3D technology with Global Foundaries 130 nm process 1.5 V. The purpose is to demonstrate the feasibility of 3D IC architecture for SoC design. The partitioning scheme is done manually at block level where USB controller, H.264 encoder block with its local memory is placed in top tier and other blocks in bottom logic tier, which AHB system bus connects between both logic tiers. The design run at 60 MHz and the DRAM can run at 133 MHz.
In , they demonstrated the feasibility of 3D NoC in 3D technology in two tiers implemented using die to wafer bonding of IMEC 130 nm process with one poly and two metal layers. The design has 1 mm2 die area with 100 TSVs and 12 IO pads. The Copper TSV diameter is 5 um, 25 um depth and 10 um pitch inserted after FEOL and before BEOL formation. Each tier has a traffic generator, a slave memory, a 3x3 switch and a JTAG controller and with fault tolerant test structures. The traffic generator is programmed using JTAG controller which can send and receives flits from NoC. A slave memory is 64 bit arranged in 8 words wide 8 bit. Vertical links are unidirectional for the router and targeted for static faults like stuck at and stuck open fault. The design can run at 25 MHz at 0.4-1.5 voltage supply synchronously. Each vertical link was implemented using 2 TSVs for fault tolerant mechanism.
In , the design of 32 bit 3D adder (Kogge- Stone) and 32 x 32 3D multiplier (Wallace Tree) have been implemented using MITLL 180 nm 3D FDSOI technology to show the improvement of arithmetic circuits in 3D architecture. The chip area is 1.3 mm x 1.3 mm die area running at 200 MHz based on post place and route timing estimation. The TSV size is 3 um x 3 um diameter and ~7 um depth. The 3D adder showed up to ~34% and ~46% for speed improvement and power reduction while the 3D multiplier showed ~14% and ~7% of speed improvement and power reduction from simulation result as the fabricated chip is only used to prototype the idea and 3D design flow.
In , 3D SRAM is designed using MITLL 180 nm FDSOI process showing 32% improvement of access time measured using delay-locked loop (DLL) owing to the reduced word-line wire in 3D architecture. The TSV size is 2.5 um x 2.5 um. The 3D SRAM has 16 x 16 cell array in each tier with word line split partitioning was used for the implementation. The design is tested at a range of 70 - 130 MHz to calculate the access time. The results of the measurement showed that 40 60 ps larger from the simulated result.
In , the LDPC (low density parity check) was implemented using 3 tier MITLL 180 nm process in 6.3 um x 6.4 um die area. The design runs at 128 MHz achieving a throughput of 2 Gb/s with 430 mW power consumption. The 3D implementation shown significant improvement in terms of wire length, clock skew, area and buffer size over its corresponding 2D implementation. Finally the 3D memory on memory architecture implemented in 2.9 mm x 2.0 mm chip using Tezzaron two tier technology with Global Foundaries 130 nm technology demonstrated fast checkpointing and restore applications in 3D architecture . Each sram tier has 1Mbit capacity built in 64 banks, each bank has 256 words and 64 bit wide. The chip can perform checkpointing/restart at 4k/cycles with 1 GHz speed.
The summary of the previous 3D architecture implementation is shown in Table 1 . To further investigate the 3D architecture, we are currently designing 16 processors in 2 tier using Tezzaron 3D technology. Each tier has 8 processors connected using 4x2 mesh NoC. We use open source processor which is readily available and we design a 3D router and network interface. The processor connected to the network interface using simple FIFO based communication for both data and synchronization. The aim is to measure 3D NoC performance in real chip by running several multimedia applications. We want also to study parallel implementation in 3D NoC architecture.
3D integration technology is currently under active research by many organizations and more study and investigation is needed especially in the design trade off between its advantages and drawbacks. This paper summarized in general about 3D integration covering how 3D overcome scaling issues and what are the issues and challenges related to it. We also described several 3D chips implemented using different process with different purposes. The purpose is to give general but detail analysis that includes all aspects of 3D IC design.
Table 1: Summary of 3D architecture
|Work ||Architecture / purpose ||Technology / number of tier|
| ||3D multicore (64 core) / to demonstrate large memory bandwidth ||130 nm / 2 tier|
| ||3D mesh NoC with traffic gen. / to demonstrate working 3D NoC ||180 nm / 3 tier|
| ||3D FFT processor / demonstrate 3D benefit of speed improvement and area reduction ||180 nm / 3 tier|
| ||3D SoC for H.264 / demonstrate 3D SoC architecture ||30 nm / 5 tier (2 tier for logic, 3 tier DRAM)|
| ||3D mesh NoC (single switch) with traffic gen. / demonstrate feasibility of 3D NoC ||130 nm / 2 tier|
| ||3D adder and 3D multiplier / demonstrate arithmetic circuit improvement in 3D ||180 nm / 3 tier|
| ||3D SRAM / demonstrate memory access time improvement in 3D ||180 nm / 3 tier|
| ||3D LDPC decoder / demonstrate 3D architecture benefits (wirelength, clock skew, area) ||180 nm / 3 tier|
| ||3D SRAM / demonstrate fast checkpointing and restore application of hard disk drive ||130 nm / 2 tier|
 System Driver, 2009 Update, www.itrs.net .
 D. H. Triyoso, T. B. Dao, T. Kropewnicki, F. Martinez, R. Noble and M. Hamilton, Progress and challenges of tungsten-filled through-silicon via, IC Design and Technology (ICICDT), 2010 IEEE International Conference on, 2010, pp. 118- 121.
 T. Dao, D. H. Triyoso, R. Mora, T. Kropewnicki, B. Griesbach, D. Booker, M. Petras and V. Adams, Thermo-mechanical stress characterization of tungsten-fill through-silicon-via, VLSI Technology Systems and Applications (VLSI-TSA), 2010 International Symposium on, 2010, pp. 96- 99.
 M. J. Wolf, T. Dretschkow, B. Wunderle, N. Jurgensen, G. Engelmann, O. Ehrmann, A. Uhlig, B. Michel and H. Reichl, High aspect ratio TSV copper filling with different seed layers, Electronic Components and Technology Conference, 2008. ECTC 2008. 58th, 2008, pp. 563-570.
 G. Katti, A. Mercha, J. Van Olmen, C. Huyghebaert, A. Jourdain, M. Stucchi, M. Rakowski, I. Debusschere, P. Soussan, W. Dehaene, K. De Meyer, Y. Travaly, E. Beyne, S. Biesemans and B. Swinnen, 3D stacked ICs using Cu TSVs and Die to Wafer Hybrid Collective bonding, Electron Devices Meeting (IEDM), 2009 IEEE International, 2009, pp. 1-4.
 M. Koyanagi, T. Fukushima and T. Tanaka, High-Density Through Silicon Vias for 3-D LSIs, Proceedings of the IEEE, 97 (2009), pp. 49-59.
 K. Nomura, K. Abe, S. Fujita, Y. Kurosawa and A. Kageshima, Performance analysis of 3D-IC for multi-core processors in sub- 65nm CMOS technologies, Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, 2010, pp. 2876-2879.
 R. S. Patti, Three-Dimensional Integrated Circuits and the Future of System-on-Chip Designs, Proceedings of the IEEE, 94 (2006), pp. 1214-1224.
 M. Rimskog and T. Bauer, High density Through Silicon Via (TSV), Design, Test, Integration and Packaging of MEMS/MOEMS, 2008. MEMS/MOEMS 2008. Symposium on, 2008, pp. 105-108.
 C. Kenyon, A. Kornfeld, K. Kuhn, M. Liu, A. Maheshwari, W. Shih, S. Sivakumar, G. Taylor, P. VanDerVoorn and K. Zawadzki, Managing Process Variation in Intel's 45nm CMOS Technology, Intel Technology Journal, 12 (2008).
 K. J. Kuhn, CMOS scaling beyond 32nm: Challenges and opportunities, Design Automation Conference, 2009. DAC '09. 46th ACM/IEEE, 2009, pp. 310-313.
 G. M. Link and N. Vijaykrishnan, Thermal trends in emerging technologies, Quality Electronic Design, 2006. ISQED '06. 7th International Symposium on, 2006, pp. 8 pp.-632.
 S. Das, A. Chandrakasan and R. Reif, Threedimensional integrated circuits: performance, design methodology, and CAD tools, VLSI, 2003. Proceedings. IEEE Computer Society Annual Symposium on, 2003, pp. 13-18.
 B. Bryan, A. Murali, B. Ned, D. John, J. Lei, H. L. Gabriel, M. Don, M. Pat, W. N. Donald, P. Daniel, R. Paul, R. Jeff, S. Sadasivan, S. John and W. Clair, Die Stacking (3D) Microarchitecture, Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, 2006, pp. 469-479.
 R. Weerasekera, Z. Li-Rong, D. Pamunuwa and H. Tenhunen, Extending systems-onchip to the third dimension: performance, cost and technological tradeoffs, Computer- Aided Design, 2007. ICCAD 2007. IEEE/ACM International Conference on, 2007, pp. 212-219.
 M. Awasthi and R. Balasubramonian, Exploring the Design Space for 3D Clustered Architectures, 3rd IBM Watson Conference on Interaction between Architecture, Circuits, and Compilers (P=ac2), 2006.
 P. Young Jin, Z. Min, L. Byeong-seok, A. L. Jeong, K. Seung Gu and K. Cheol Hong, Thermal Analysis for 3D Multi-core Processors with Dynamic Frequency Scaling, Computer and Information Science (ICIS), 2010 IEEE/ACIS 9th International Conference on, 2010, pp. 69-74.
 C. David, A. Jose, H. Jose, P. Massimo, A. Andrea and M. Enrico, Thermal-aware floorplanning exploration for 3D multi-core architectures, Proceedings of the 20th symposium on Great lakes symposium on VLSI, ACM, Providence, Rhode Island, USA, 2010.
 K. Puttaswamy and G. H. Loh, Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors, High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on, 2007, pp. 193-204.
 E. Wong and L. Sung Kyu, 3D Floorplanning with Thermal Vias, Design, Automation and Test in Europe, 2006. DATE '06. Proceedings, 2006, pp. 1-6.
 Y. Hao, S. Yiyu, H. Lei and K. Tanay, Thermal Via Allocation for 3-D ICs Considering Temporally and Spatially Variant Thermal Power, Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 16 (2008), pp. 1609-1619.
 S. Govind Singh and T. Chuan Seng, Thermal mitigation using thermal through silicon via (TTSV) in 3-D ICs, Microsystems, Packaging, Assembly and Circuits Technology Conference, 2009. IMPACT 2009. 4th International, 2009, pp. 182-185.
 W. L. Hung, G. M. Link, X. Yuan, N. Vijaykrishnan and M. J. Irwin, Interconnect and thermal-aware floorplanning for 3D microprocessors, Quality Electronic Design, 2006. ISQED '06. 7th International Symposium on, 2006, pp. 6 pp.-104.
 J. Cong, W. Jie and Z. Yan, A thermaldriven floorplanning algorithm for 3D ICs, Computer Aided Design, 2004. ICCAD- 2004. IEEE/ACM International Conference on, 2004, pp. 306-313.
 B. Goplen and S. Sapatnekar, Efficient thermal placement of standard cells in 3D ICs using a force directed approach, Computer Aided Design, 2003. ICCAD- 2003. International Conference on, 2003, pp. 86-89.
 Y. Jae-Seok, K. Athikulwongse, L. Young- Joon, L. Sung Kyu and D. Z. Pan, TSV stress aware timing analysis with applications to 3D-IC layout optimization, Design Automation Conference (DAC), 2010 47th ACM/IEEE, 2010, pp. 803-806.
 M. C. Hsieh, H. Yung-Yu and C. Chao- Liang, Thermal Stress Analysis of Cu/Low-k Interconnects in 3D-IC Structures, Microsystems, Packaging, Assembly Conference Taiwan, 2006. IMPACT 2006. International, 2006, pp. 1-4.
 H. H. S. Lee and K. Chakrabarty, Test Challenges for 3D Integrated Circuits, Design & Test of Computers, IEEE, 26 (2009), pp. 26-35.
 G. Qun, X. Zhiwei, K. Jenwei and C. Mau- Chung Frank, Two 10Gb/s/pin Low-Power Interconnect Methods for 3D ICs, Solid- State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International, 2007, pp. 448-614.
 M. B. Healy, K. Athikulwongse, R. Goel, M. M. Hossain, D. H. Kim, L. Young-Joon, D. L. Lewis, L. Tzu-Wei, L. Chang, J. Moongon, B. Ouellette, M. Pathak, H. Sane, S. Guanhao, W. Dong Hyuk, Z. Xin, G. H. Loh, H. S. Lee and L. Sung Kyu, Design and analysis of 3D-MAPS: A many-core 3D processor with stacked memory, Custom Integrated Circuits Conference (CICC), 2010 IEEE, 2010, pp. 1-4.
 C. Mineo, R. Jenkal, S. Melamed and W. R. Davis, Inter-die signaling in three dimensional integrated circuits, Custom Integrated Circuits Conference, 2008. CICC 2008. IEEE, 2008, pp. 655-658.
 T. Thorolfsson, K. Gonsalves and P. D. Franzon, Design automation for a 3DIC FFT processor for synthetic aperture radar: A case study, Design Automation Conference, 2009. DAC '09. 46th ACM/IEEE, 2009, pp. 51-56.
 Z. Tao, W. Kui, F. Yi, C. Yan, L. Qun, S. Bing, X. Jing, S. Xiaodi, D. Lian, X. Yuan, C. Xu and L. Youn-Long, A 3D SoC design for H.264 application with on-chip DRAM stacking, 3D Systems Integration Conference (3DIC), 2010 IEEE International, 2010, pp. 1-6.
 I. Loi, P. Marchal, A. Pullini and L. Benini, 3D NoCs - Unifying inter and intra chip communication, Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, 2010, pp. 3337-3340.
 J. Ouyang, G. Sun, Y. Chen, L. Duan, T. Zhang, Y. Xie and M. J. Irwin, Arithmetic unit design using 180nm TSV-based 3D stacking technology, 3D System Integration, 2009. 3DIC 2009. IEEE International Conference on, 2009, pp. 1-4.
 X. Chen, T. Zhu and W. R. Davis, Threedimensional SRAM design with on-chip access time measurement, Electronics Letters, 47 (2011), pp. 485-486.
 L. Zhou, C. Wakayama, N. Jangkrajarng, H. Bo and C. J. R. Shi, A high-throughput lowpower fully parallel 1024-bit 1/2 -rate low density parity check code decoder in 3D integrated circuits, Design Automation, 2006. Asia and South Pacific Conference on, 2006, pp. 2 pp.
 X. Jing, D. Xiangyu and X. Yuan, 3D memory stacking for fast checkpointing/restore applications, 3D Systems Integration Conference (3DIC), 2010 IEEE International, 2010, pp. 1-6.
Keywords 3D IC, 3D NoC, MPSoC, survey