Timing Closure in the FinFET Era
Charlie Janac - President and CEO of Arteris
Achieving system-on-chip (SoC) timing closure is a major obstacle in the FinFET era. Even though designers can now use faster transistors that consume and leak less power than before, FinFET technology does not address the on-chip communications infrastructure or metal line resistance/capacitance issues that negatively impact timing closure.
To make a FinFET-based system capable of taking advantage of these transistor benefits, design teams must pay much more attention to the interconnect IP that connects IP blocks within the system. The current state of the art is to manually add pipelines (also called “register slices”) along on-chip links that do not meet timing. However, there is now new technology where system architects can rapidly accelerate this process as much as 30X by automating pipeline insertion, which is offered through advanced interconnect implementation tools.
Timing closure is defined as getting all of the relevant signals on a chip arriving at the right time in order for the chip to operate correctly. In order to achieve timing closure for the SoC interconnect, and the entire SoC, we have to insert repeaters or pipelines in order to meet timing. In FinFET type SoCs, as the transistors get faster and leak less power, wires are also continuing to get smaller, increasing the resistance per micron of wire length. Therefore, as transistors become less of a timing issue, wires become the main timing closure constraint. And the on-chip interconnect is the IP that contains the most wires, and the longest wires.
Figure 1: Pipeline Insertion Model
CAPTION: Pipeline insertion helps SoCs built in 28nm or below to achieve timing closure. As complexity has grown, so has pipeline insertion, and this has impacted schedules greatly.
The result is that it is no longer possible to cross even a small SoC in one clock cycle. If we assume the following parameters, we have to insert three pipelines in the above connection.
- NoC at clock of 600MHz = 1.67 ns cycle time, usable 1.42 ns
- Transport delay of 0.644 ns/millimeter (best in 28nm/TSMC 28HPM)
- Pipeline-to-pipeline maximum distance is 2.2 millimeter
In a complex SoC, hundreds of thousands of signals now have to travel long distances across the die, necessitating the use of thousands of repeaters or pipelines of multiple types that must be inserted in along these paths in order to close timing. As we go down in process geometry, increase target frequency or decrease voltage, the timing closure problem only gets worse.
Timing Closure Choices
Faced with this growing timing closure complexity, SoC designers have three design choices;
- Under-design: Ignore the problem and have the layout team deal with it.
- Over-design: Close timing by inserting pipelines manually.
- Optimize: Use a new automated interconnect timing closure capability.
Ignoring the timing closure problem in the front end may work at 40nm because layout teams can use their methodologies to close timing successfully. At 28nm and below, however, timing closure solely in the layout domain becomes problematic. Essentially the layout team is given an architecture and RTL design where the design team has never considered physical constraints involved in closing timing. This leads to multiple place and route cycles, unforeseen schedule delays and even forced reworking of the architecture and the RTL implementation. One industry example involved a gaming SoC where inability to close timing led to a missed market window, project cancellation, and a $200M loss.
Another problem is that without an architecture timing closure scheme, layout synthesis or place and route systems will insert low voltage threshold (low VT) cells in order to close timing, leading to increased power consumption and increased SoC die costs.
Manual Timing Closure = Overdesign
Inserting pipelines manually may still be possible in FinFET type SoCs, but customer design team experience shows that doing this can take about 30 to 45 days for a 60mm2 SoC. Because the interconnect changes more often during this period, the timing closure scheme has to be over-engineered to avoid invalidation by underlying interconnect changes. This causes designers to use more pipelines than is optimal, which in turn calls for additional pipelines that result in additional interconnect area, power consumption and latency.
If the interconnect changes during a design project are significant, the manual timing closure process has to be redone anyway, potentially causing SoC schedule delay. Eventually, manual pipeline insertion will become too difficult for SoCs with complex architectures or high performance requirements. The interconnect timing closure problem only gets worse with decreasing geometries, higher interconnect frequencies and lower voltage thresholds.
Automated timing closure
There is a new approach, which is to use the network-on-chip (NoC) interconnect to provide accurate timing closure estimates prior to layout. This allows architecture and RTL teams to validate their architectures from a physical point of view prior to layout, decreasing the burden on layout schedules, resources and cost.
A feature-rich, high performance NoC implementation isolates the functional and I/O IPs from the interconnect by using the IP socket as a boundary between the IPs and the interconnect. A majority of the timing closure paths reside in the interconnect, which also contains a majority of the SoC wires. From a logic point of view, interconnect IP contains not only all of the logic that interfaces to all major IPs in the chip, but also the logic that manages IP protocol conversion, data transport, quality of service, power management implementation, hardware security, debug observability and other functions.
From a physical point of view, the interconnect IP must fit into the “white space” routing channels between the various blocks of a chip. The interconnect is the only IP that sees all of the SoC data traffic and connects to all of the important IPs in the SoC. It is the only IP in the SoC that changes multiple times within an SoC project schedule. Additionally, it changes from project to project, to address IP changes as market requirements evolve. Because NoC-type interconnects are easier to place and change over time, it is the base technology needed to tackle interconnect timing issues.
Floorplan: Estimating distance for timing
In addition to a NoC-type interconnect, designers need a floorplan because they need to estimate the physical distances that each signal travels within the SoC. By reading in a floorplan during the later stages of the design or creating a physical floorplan estimate at the architectural phase, engineers can generate an accurate physical NoC instance that can be used to estimate the distance between IP socket end-points and consequently the pipeline insertion distance needed to meet performance and timing closure.
A benefit of the NoC-type interconnect is that the supporting tools have a thorough knowledge of the interconnect RTL which enables them to run automatic pipeline insertion and RTL analysis on the resulting NoC instance when combined with the knowledge of the interconnect RTL placement. Automation can provide accurate timing closure estimates in as little as one to 1.5 days, even when verification by logic synthesis tools is considered. This dramatic 30X improvement in timing closure productivity enables the creation of a custom timing closure scheme for each version of the SoC architecture that is under development. This eliminates the need to over-engineer the timing closure scheme. Additionally, automated timing closure results in smaller die area, lower power consumption and lower latency. This also improves performance and lowers the cost of the SoC.
Figure 2: Automated Timing Closure Success
CAPTION: Multiple place and route cycles during chip design creates schedule delays. Planning in advance can avert these delays and produce highly competitive designs on a rapid pace.
If architects do not have the tools and methodologies to estimate the impact of their SoC topology choices, then they cannot take into account timing closure constraints. And the earlier you catch a timing closure problem the less expensive it is to fix. In complex FinFET designs it is quite possible to choose an SoC architecture that is difficult to route in back end physical design. Consequently, it is potentially disastrous to ignore the timing closure problem during RTL design because the place and route team is essentially being given an architecture and RTL design that has not been verified from a timing closure constraint point of view. This can lead to multiple place and route cycles which are increasingly time consuming and can even necessitate repeating the architecture and RTL design in order to get the SoC to place and route successfully and achieve timing sign off. By combining knowledge of the IP sockets, the NoC interconnect RTL and an evolving floorplan, architects can close timing in an automated fashion that keeps growing complexity at bay.
Future technology: The interconnect timing debugger
Another useful interconnect capability is a timing debugger. If the timing does not converge automatically, it would be desirable to have a timing analysis and debugging capability in order to provide direction to end-users on how to modify the timing scheme and the architecture for successful convergence. Such an analysis again requires knowledge of the NoC interconnect RTL. The combination of NoC interconnect logic, a floorplan, automatic pipeline insertion and timing analysis capability can be viewed as a method to achieve design space optimization because it allows rational tradeoffs between area, power, latency and frequency. More technology needs to be developed in this area to help SoC designers better manage the complexity of FinFET-type SoCs.
Figure 3: Automation Advantages
CAPTION: Automated pipeline insertion through NoC interconnect implementation, tools and debugging provides advantages in power, area, latency and convergence acceleration.
Closing interconnect timing is critical to closing chip timing
In summary, SoC interconnect IP can facilitate timing closure at several different levels. It can assist timing closure at the architecture level before we have a floorplan by estimating the size and shape of the IPs in the design to determine if the chosen SoC architecture will meet timing closure. Once there is a floorplan, we can read this into NoC exploration and generation tools and refine the timing estimate, which provides a good idea whether the timing will close at the RTL level. We can then correlate the interconnect timing closure estimate with a layout synthesis tool to provide a verified timing closure starting point for the SoC layout process. Automated timing closure, based on NoC interconnect technology, provides major benefits in improving SoC cost, performance and delivery.
About the Author:
K. Charles Janac is chairman, president and chief executive officer of Arteris. Over 20 years of his career, he has worked in multiple industries including electronic design automation, semiconductor capital equipment, nano-technology, industrial polymers and venture capital.
If you wish to download a copy of this white paper, click here
|
Arteris Hot IP
Related Articles
- Overcoming Timing Closure Issues in Wide Interface DDR, HBM and ONFI Subsystems
- Making Better Front-End Architectural Choices Avoids Back-End Timing Closure Issues
- Complex SoCs: Early Use of Physical Design Info Shortens Timing Closure
- Timing closure in multi-level partitioned SoCs
- Design Rule Violation fixing in timing closure
New Articles
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
- Timing Optimization Technique Using Useful Skew in 5nm Technology Node
- Streamlining SoC Design with IDS-Integrate™
- Last-Time Buy Notifications For Your ASICs? How To Make the Most of It
Most Popular
- Advanced Packaging and Chiplets Can Be for Everyone
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Timing Optimization Technique Using Useful Skew in 5nm Technology Node
- Streamlining SoC Design with IDS-Integrate™
- System Verilog Assertions Simplified
E-mail This Article | Printer-Friendly Page |