High power consumption and excessive heat generation are major barriers that design engineers must overcome during system-on-chip (SoC) implementation. Designers are faced with problems such as infrared (IR) drop, electromagnetic interference (EMI), increased internal noise and shortened battery life. As a result, denser and wider power grids are required, taking away valuable placement and routing resources. All of these conditions lead to larger die sizes, inaccurate delay calculations, unnecessary layout iterations or, even worse, a respin of the design.
Because heat distribution packaging is more expensive and consumes more real estate on the printed-circuit board, many companies developing SoCs suffer from increased design costs and lengthened time-to-market.
Some of the SoC design techniques that are currently available to designers include low-power libraries, multiple voltage domains, gated-clock synthesis and heat-sink pack aging. However, these methods have done little to effectively address the high power consumption problems in SoC designs. In many cases, IR drop and EM problems are still severe, and the chip still fails to function as desired-making the goal of a smooth SoC design seem unreachable to most, if not all, designers.
During the past year, Oki Semiconductor has searched for a comprehensive solution to these power-related problems. Partnering with Golden Gate Technology Inc. (GGT; San Jose, Calif.), Oki developed a design methodology for its design flow that produces low-power SoC designs. This fully integrated design methodology eliminates the need for heat sink devices, reduces the total cost of design development by as much as 50 percent and reduces time-to-market.
Traditionally, designers relied upon the refinement of register-transfer-level coding architecture and synthesis methodologies to conserve power. Power is generally estimated before placement, then calculated after routing. This techn ique yields an unreliable pre- and post-layout correlation.
Let's consider some important facts. It is widely accepted that clock nets consume roughly 50 percent of the power in an SoC. This is further exacerbated by the automated clock network design techniques. The most effective way to reduce the overall chip power lies in trimming clock power consumption. To illustrate that point, let's consider the estimation of clock net power as follows:
Clock power = 1/2 CV2F + PF
C = clock net capacitance (wires and input impedances)
V = voltage
F = frequency
P = clock buffer power per MHz
Given the frequency and voltage as constants, designers want to minimize the number of clock tree buffers and clock net capacitance.
In a conventional layout flow, placement and routing are optimized for timing, with little or no consideration for power, routability or signal integrity. During placement, clock nets are considered as ideal nets, and the y are ignored during placement optimization. Clock tree synthesis (CTS) often introduces a large number of buffers and lengths of wiring to balance the clock skew caused by irregular placement of sequential logics.
Oki has tested and adopted GoPower, a tool provided by GGT to effectively autoplace designs in a power-efficient manner without sacrificing timing and routability, using an F-Array structure.
In this structure, clock nets are not treated as signal nets. Flip-flop (FF) distribution is traced at the floor plan stage, and the placer enforces uniformity and confinement of FF distribution along with performance-optimized general cell placement. The lowest-level FF is placed into proprietary symmetric F-Array structures. With its unique algorithm, sequential cells are placed in a structured array, allowing effective balancing of clock trees by construction. As a result, fewer buffers are required, clock wire lengths are shortened and power consumption is reduced. From Oki's study, F-Arra y placement yields an average of two to 10 times fewer clock driver insertions, 1.5 to three times shorter clock wire lengths and 15 to 30 percent less overall power consumption as compared with the traditional flow using traditional timing-driven placement.
| Power-driven layout reorganizes placement of flip-flop structures from the original placement (left)|
The figure compares the placements from a traditional timing-driven layout and from the F-Array algorithm. Clock tree insertion and routing are performed on both databases with the same CTS and routing tool. The results show that the number of clock tree buffers in the design is reduced from 351 in the conventional placement to 72 in the F-Array structure (an 80 percent reduction). Clock skew dropped from 0.24 nanoseconds t o 0.20 ns, and overall power from 29 mW to 22 mW (a 24 percent reduction). These results prove that this placement algorithm effectively reduces power in SoC designs. It also allows early detection of IR/EM problems at each layout stage.
Oki Semiconductor and Golden Gate Technology show a comparison of placements from a timing-driven layout (left) with their method, which cuts the number of clock tree buffers.
Kelvin Chun and Anna Ling are with Oki Semiconductor (San Jose, Calif.).