Mitul Soni, Gourav Kapoor,Nikhil Wadhwa,Nalin Gupta (Freescale Semiconductor India Pvt. Ltd.)
Design Rule violation is one of the major challenges being faced by VLSI industry. With ever shrinking technology nodes, and ever increasing gate counts, reaching to more than 40 million on a single die, the complexity of the design is momentous! Often, the so called “high priority goals” be it timing, power or the area utilization take precedence and fixing the DRVs is relegated till the very end of the backend design cycle. However, due to very limited time, and congestions, DRVs often become a bottleneck to the tapeout. It is therefore prudent to fix DRVs earlier in the design cycle.
First, let us introduce the concept of Design Rule Violation. It is the job of the library characterization teams to characterize the standard cells/basic building blocks. The range over which these are characterized depends upon two factors:
The foundry guarantees the certainty in behavior of the cells over this particular range only
The upper and lower limits of characterization range are sufficiently large/small so that the cell is expected to go beyond this very-very rarely. Since, characterization has its own cost, it may not be economically feasible to characterize the cell to extreme loads and transitions.
Each cell in the cell library is provided with a lookup-table representative of the characterized characteristics. Given below is a table corresponding to the delay of a buffer for different slews and output capacitances. This kind of representation is a discrete representation. The value corresponding to any input slew and output load is calculated by the tool by interpolating between these points. The delay calculated so is accurate for small deviations from these values. However, once the transition and/or load go out of this range, the tool has to extrapolate the delay from these points, which is less accurate than interpolating since there is now only one bounding value. This inaccuracy increases further once the operating point shifts away from the extreme values given in the lookup table. Beyond this point, the actual silicon results and the delays predicted by tools will not match. Moreover, the delays predicted by different tools over extrapolation differ by a great amount since different tools use different algorithms for delay calculation. Hence, extrapolation should be avoided at all costs. Violating the limits could be a serious impediment to the yield and hence the gross margins! For the sake of clarity, let us consider following graph of a hypothetical buffer wherein the delay is plotted against load capacitance for different input transitions. As can be seen, at loads far greater than the characterized limits, the delay calculated by extrapolation can vary a lot with respect to the actual delay.
Given below is the library snapshot representing the delay of a buffer:
<Input Slew> ("0.0016, 0.00748936, 0.0281623, 0.0676403, 0.129145, 0.215443, 0.329");
<Output Cap> ("0.0001, 0.0102356, 0.0458139, 0.113756, 0.219606, 0.368125, 0.563557");
<Cell Delay> ("0.0168644, 0.0219903, 0.0366508, 0.0639328, 0.106351, 0.165843, 0.244143",
"0.0210712, 0.0261739, 0.0408774, 0.0681715, 0.110589, 0.170077, 0.248373",
"0.0351596, 0.0403175, 0.055068, 0.082394, 0.124825, 0.184314, 0.262614",
"0.0568831, 0.0622993, 0.0771272, 0.104477, 0.146841, 0.206329, 0.284645",
"0.0845704, 0.0906502, 0.105625, 0.132858, 0.175249, 0.234645, 0.312869",
"0.117194, 0.124369, 0.13985, 0.166898, 0.209139, 0.268506, 0.346709",
"0.154126, 0.162731, 0.179315, 0.20612, 0.248221, 0.307412, 0.385523");
In any of timing paths, if the input slew is less than 0.0016ns or more than 0.329ns or output load is less than 0.0001fF or more than 0.563fF, the delay of this buffer gets extrapolated and the calculated delay “X” may not be true in the Silicon.
Let us discuss the approach one should follow while fixing DRV. One of the most important rules we should obey is “NEVER LEAVE DRV TO BE FIXED AT THE LAST STAGE OF DESIGN CYCLE”. It is so because usually, the DRVs which directly impact timing are proactively fixed by the designers, but the ones which do not impact timing (or having sufficient positive setup slack) are often left till the last phase of the design cycle, since timing closure is the topmost priority job. At that time, however, congestion could make the DRV fixing quite difficult, apart from impacting the timing! DRVs impacting hold critical paths are mostly overlooked since these are usually not seen as hold violating paths, but after fixing DRV, these emerge as hold violations as discussed below.
Another reason not to overlook DRVs? Let us say, we have a timing path as shown in figure below. As we know, hold slack equation is given as:
Slack = Tck->q + Tdata - Thold
Now, if the net has a DRV, then the transition/load on the net will be very high, which will increase Tck->q and Tdata, but wiil decrease Thold. And the delay shown will not be true, it can have inaccuracies of large degree. Thus, there is every possibility of a DRV masking a hold violation. If these kind of DRVs are left till end, then not only you have to fix the DRV, but also the hold violation resulting from fixing that DRV.
DRV fixing - Approach to be followed
The best approach to fix DRV’s is through implementation tool. One should try to get fixed as many as possible at implementation time only so that there is minimum work to be done manually. However, since, all the modes are not visible at the implementation tool, there are still a few of DRVs left to be fixed manually. Not only modes, there are DRV violations observed across corners. Since, setup optimization is carried only for WCS corner, the DRV violations of BCS corner are not taken care of by implementation tool. A general approach to DRV fixing (as we approach tape-out to minimize the disturbance due to cell movement/sizing and/or addition) can include the following steps:
Make a list of all the DRV violations and find out the unique nets for transition /capacitance violations. Also, prepare a list for unique driver and load for all these nets.
Run a regression across all your timing modes and corners and find out the setup and hold slack across the unique driver nets.
We are left with a list of unique nets and their driver/load pins. Try to fix the violations in the below preference order:
First, make all the cells to lower Vt’s i.e if HVT, make them SVT or SVT to LVT provided the cells have sufficient hold slack across modes and corners. As we know that ON current through the transistor is proportional to (V-Vt) , hence by lowering the Vt , current increases which will give a better output transition. By swapping the Vt of driver cells, timing /congestion should NOT be impacted much, hence this is the easiest way.
Utilize other flavors with same cell footprint/area Some libraries have a number of cells with same area/footprint specifically for the purpose of cell swapping for timing optimization close to tape-out. For instance, one library might have buffers of drive strength 2 and 4 implemented with same area. The library might also have different channel length flavors for same cell with same footprint. We can utilize these so as to take a step forward to DRV closure without any overhead on congestion due to cell movement/sizing. However, here also, we have to be sure that we are not introducing any hold violation due to the cell swap.
Increase the Drive strength. Previous steps allow to minimize the number of DRVs without any impact on congestion close to tape-out. However, almost always we are left with a lot of DRVs that cannot be done away with cell swaps without disturbing the design. The first thing, then is to analyze for each of the DRVs is if the DRV can be done away with cell sizing. Since, current is directly proportional to the W/L ratio, hence by increasing the width, it will enhance the output transition and better output load driving capacity. Drive strength should increase in accordance with the violation. Choose proper drive strength of the driving cell so as to compensate the DRV violation. If the magnitude of DRV is small, choose to increase the drive strength by a small factor.
Load Splitting (Also called BUS splitting). There are two possible scenarios for this:
If the driver has only a single fanout, then the solution is pretty simple. We just have to add a repeater buffer at a strategic location on the net so as to distribute the net load between the driver and newly inserted buffer. The sizes of the current driver and newly inserted buffer have to be decided keeping in mind the load seen by each of these after the new connections will be made.
On the other hand, if the driver of DRV net has more than one fanout, then the process is more complex. The solution to the problem requires timing and congestion feedbacks. We should add a buffer strategically after the driver so that the loads are equally distributed between these two cells.
The most important thing to do after each step is to verify the timing slack and revert the ones resulting in timing degradation. Almost every time, fixing a DRV can result in a hold violation. If it happens, one should revert the fix for the same DRV and add buffers strategically placed so as to distribute the load as discussed later in this text. Doing so, there will be DRV fixed as well as the delay of the buffer will help fixing the hold violation. By repeating the multiple iterations of the above approach we can fix our DRV violations.
Special careabouts/heuristic approach to fix DRVs
If all the DRVs are not fixed, one can try the following steps. First thing to find is the main reason behind the occurring of DRV. There may be many reasons for this, some of the prominent reasons being congestion or timing impact due to fixing the DRV.
If congestion is the reason, we should try to distribute the less timing critical logic to other regions. One should re-distribute the cells in other regions (with timing iterations) so that the congestion of particular area is gone and one can fix the DRVs in that area.
If the timing is affected by the Vt swaps / drive strength / buffer insertion, we should back trace the timing path manually and find out a particular node where the slew can be improved and its positive effect is propagated to the culprit cell and improve the input transition of that cell. But one should remember to check the timing slack after every change at any stage.
One of the most important thing to be checked beforehand. One should verify the scaling of tran and cap limit across the timing corners. In some designs, the limits are not scaled properly in a particular PVT corner, hence there are huge no. of violations in that corner. For example - Due to improper scaling of tran and cap limit in best case we saw 10000 drv violations in best corner in comparison to 100 in worst corner.
Slack based Approach: This is the step which we can use in the last stage for the leftover DRV’s. For example, if we have left over DRVs in best case and also hold timing is more critical in this corner than worst. So If we can ensure that the hold slack of the timing path is more than the load pin (output load of the DRV net) cell delay, then even if the delay of that cell tends to zero still the hold timing of that path is met. Hence if this condition is met we can consider to waive off the DRVs which are not getting fixed.
Conclusion: As from the above, we have seen that there are different ways to fix DRVs, but each and every step has its own challenges as they are dependent on timing / power / area utilization. Hence one should try to fix the DRVs in the earlier stages of design cycle so that they can have proper feedbacks for timing and congestion issues.
General flow one should follow, while DRV fixing
If you wish to download a copy of this white paper, click here