Designing Low Power Standard Cell Library With Improved Drive Granularity
By Xiaonan Zhang, Ph. D. Qualcomm
As technology advances, power dissipation has now become the number one challenge facing todays chip designers. Total power consumption consists of two parts - dynamic power and leakage power. Typically, a large ASIC design is composed of a variety of blocks of circuit blocks, including I/Os, standard cells, memories, and IP blocks. Standard cells typically consume the major portion of the total chip power. In particular, flip-flop cells and clock cells consume the major portion of the total dynamic power. Therefore, for low power and low leakage designs the standard cell library has a significant impact on a chips power dissipation.
Standard cells, the basic building blocks for combinational and sequential logic design, are each implement simple well-known logic functions. Since they have been in existence for over 20 years, the design of the actual standard cells is often considered as not very challenging, as a somewhat mundane activity. Often an existing standard cell library is simply scaled down from the current generation to the next generation. However, in an engineering environment predicated on for the notion of continuous improvement, the question must be asked, Is there room for innovation in the design and use of the next generation standard cell library? The answer is yes.
In this paper, we describe the methodology for designing a library which produces low power and lower leakage designs. This approach of designing standard cell library does not require any CAD tool or ASIC design flow changes, nor does it require any process changes. The topology of each standard cell and the gate length of each transistor are unchanged. The innovation relates to how cell drive strength should be determined.
2. 1X Dominant Drive Problem
Historically, regardless of the speed and power targets of an ASIC implementation, the 1X drive has always been the dominant drive. Often over 70 percent of the total standard cell count is comprised of 1X drive standard cells. Important questions are, Why do logic synthesis tools use a preponderance of 1X drive cells? What are the implications of this? As will be explained, this 1X Dominant Drive Problem is actually an opportunity. When providing logic synthesis tools with a standard cell library with a more optimal drive profile, the 1X Dominant Drive Problem is eliminated. A highly optimized synthesis result can therefore be achieved. There has been surprisingly little work published to address this opportunity to optimize for speed, power, area, or manufacturability in a standard cell library.
When a dominant drive is found in a design, this alerts us that the tool is settling for using 1X drive cells, when in fact it may well be able to use a another smaller drive cell, if it were available. Once smaller drive cells are made available, the usage of the 1X drive will reduce, and ideally no longer be dominant. Through the use of smaller drive cells, both dynamic power and leakage power will decrease. Traditional cell drive strength design for a standard cell library is primarily transistor size based. When1X transistor is 1.2um wide, 2X transistor is sized to be 2.4um wide, and a 4X transistor is sized to be 4.8um wide, etc. The nmos & pmos transistors in a 1X drive cell are typically single finger layouts. The widths of this nmos & pmos pair is then ratioed to be as large as possible subject to the fixed height of the standard cell library. Typically, one drive strength smaller than 1X is provided. The widths of the nmos & pmos transistors is ratioed with one size set at close to the minimum width allowed by the process technology.
For a fixed load, a plot of the delay time versus the drive strength for each drive in a typical standard cell library results in a curve similar to the one shown below. Note that the curve is nonlinear. At the high drive end of the curve, when a 4X cell is replaced by an 8X cell for faster switching, the speed gain is disproportionately small relative the large increase in power. On the other hand, at the low drive end of the curve, when a 1X cell is replaced an LX cell for power reduction, the propagation delay is disproportionately high relative to the small savings in power. Due to the large delay difference between LX and 1X, fewer LX cells are used. 1X becomes the dominant drive.
Figure 1: Transistor Size Based Drive Strength with Fixed Load
To further illustrate the dominant drive problem, assume, hypothetically, there is only one drive available in a library for which we are synthesizing the logic. The distribution of drive strength usage will be as shown in Fig. 2, a single (navy) bar. When we add availability of a smaller 0.5X drive, the distribution of drive strengths will change from a single bar to a pair bars. The new distribution will be some percentage split (36/64 or 58/42, etc.) between 0.5X and 1X drives. Ideally, 50% distribution gives rise to the most total dynamic and leakage power savings.
Figure 2: Cell Usage Distribution in high speed
Figure 3: Cell Usage Distribution in Lower Speed
For a slow speed design or a non-timing critical block, more 0.5X cells than 1.0X cells will be used as shown in Fig. 3 (62.5% 0.5X and 37.5% 1X). However, with only one drive or with a dominant drive, the above power reduction cannot be realized. We then should consider adding a drive MX cell in between LX and 1X. The sizing of the nmos & pmos transistors for MX is critical. For example, when the size is too close to 1X in size, MX usage may become 90%. Another dominant drive is created.
3. How Many Drives Shall We Have?
We certainly can design a standard cell library with drives such as 0.5X, 0.6X, 0.7X, 0.8X, 0.9X and 1.0X. The dominant drive problem will disappear. However, the library size will dramatically increase which in turn stresses library design work. Also, this may lead to undesirable increases in logic synthesis run times. Therefore, we should be economical in increasing the total number of cells in a standard cell library. The challenge is to arrive at both a) the number of drive strengths, and b) the profile of drive strengths, such that the library meets design needs without a dramatic increase in the total cell count of a standard cell library.
4. Slower Cells Are Useful
As the CMOS supply voltage scales downward, the threshold voltage Vt scales with it. Since sub-threshold leakage depends on Vt exponentially, the leakage power increases dramatically from generation to generation. Starting from 0.13u technology, in addition to a default Standard threshold voltage implant (SVt), a High threshold voltage HVt implant is offered by foundries to reduce leakage. Typically, HVt implant reduces transistor sub-threshold leakage by 5X. HVt cells are approximately 35% slower than SVt cells. When HVt and SVt cells are used together in a design, more than 70% of the cells used are HVt cells. The rest are SVt cells. This indicates the following important facts:
Slower cells are still of great value in high performance design.
Critical path logic makes up a relatively small percentage of an entire design. As long as we have SVt cells, we can meet timing constraints.
The majority of logic paths are non-timing critical. The gates in many non-timing critical paths can be substituted by much slower cells such as HVt cells.
Knowing that the majority of cells can be 35% slower and being aware of the 1X dominant drive problem, it is clear that more drives smaller than the 1X should be added into the standard cell library. As lower drive cells are used, the input capacitance decreases enabling switching speed to increase. Also, the demand on high drive cells decrease.
5. Ideal Drive Strength Distribution
An ideal library should provide even drive usage distribution. It should not produce dominant drive usage. The most used higher drive should be less than 30% for example. An ideal drive strength design should have the following feature:
It does not cause a big increase in total number of cells in library.
It has a wide range of variations of drive strengths to support both critical paths and non-critical paths, both high speed and low power designs, and both large wire load and small wire load designs.
It produces more even drive usage distribution and no single dominant drive.
Next we discuss how we can design drive strengths to achieve the above objectives.
6. Low Power Drive Strength Design
Based on the above discussion, when designing a standard cell library, we need to provide better choices in the drive range below 1X. The total drives below 1X drive vary based on the cell logic. For inverters, there can be more drives below 1X. For three inputs Nand cell, there can be less drives below 1X. The number of drives below 1X also depends on the cell height. In todays Qualcomm design, the most used drives are no longer 1X. Lower dynamic and lower leakage is therefore achieved.
Finer drive granularity below 1X has the following merits:
It does not cause a leap in total number of cells in library.
It minimizes the dominant drive problem.
It provides lower dynamic power and lower leakage designs.
It will provide more power savings for low speed design.
For more than a decade, 1X drive has prevailed in chip designs. The finer drive granularity for standard cell library improves cell drive usage and eliminates the dominant drive problem. As a result, it reduces both dynamic power and leakage power significantly. This library design technique works for all process technologies. Its benefits will become more important in future low power designs.
From the transistor size granularity perspective, finer drive granularity requires finer transistor size granularity. Typically, it is in the range of 1/5 of a full finger size. In a typical ASIC design, the average transistor size is near the half of the full finger. Considering various gate array solutions currently available, with fewer transistor size options, the gate array solution is at a big disadvantage in term of power and leakage.
In this paper, the problem with traditional library drive strength design is discussed. The new approach to drive strength design is introduced. Designs showed improvement in both dynamic power and leakage power.
About the Author
Dr. Xiaonan Zhang, director of engineering at Qualcomm, in San Diego CA, graduated from the University of Minnesota, has over 22 years IC circuit design experience. Prior to joining Qualcomm, he worked at VeriSilicon Inc. Nvidia and Metaflow Technologies Inc. He is known as the inventor of the interleaving inverting repeater circuit where the circuit minimizes the capacitive coupling between adjacent wires. His current interests are in the areas of low power design and the design of resilient circuit against process variability.