By Ateet Mishra, Uchit Singhal, Jatinder Kumar – Freescale Semiconductors India Pvt Ltd
In a parallel race of achieving higher design frequencies and lesser silicon area, we have moved to lower technology nodes like C40, C22 etc. This has increased the probability of Silicon variation. To encounter this variation, designs are closed in OCV mode. Design timing checked in STA tool is considered for worst possible scenario. This sometimes results into extra timing violations being fixed.
To explain the problem, lets first start with the basics of worst and best slew propagation. In Static Timing Analysis for any multi input gate, worst slew propagation is considered. Let’s consider the following figure:
If path is considered for late analysis, maximum value of slew has to be considered, i.e.,
Slew @ X =~ MAX of ( Slew @ A1 , Slew @ A2 )
If path is considered for early analysis minimum value of slew has to be considered, i.e.,
Slew @ X =~ MIN of ( Slew @ A1 , Slew @ A2 )
Slew being factor behind the delay calculation affects the delay forward cells in the chain. Therefore,
Our primary focus has always been clock cells and we maintain a very good transition on clock paths. For instance, for a project in C40, allowed transition falls in the category:
Clock path: 300ps ; data path: 1200ps
Now because of worst slew propagation, this transition difference at the input of multi input clock cells results in lots of extra violations to be fixed. To explain this, let’s consider the following figure:
Consider the OR gate and mux in the clock path
For setup timing: Launch path will be delayed to maximum because of worst slew propagation while the capture path will be shortened to minimum because of best slew propagation. This will cause a negative skew for setup and would cause timing violation of same skew.
For Hold timing: Launch Path will be shortened to minimum as best slew will be propagated while the capture path will be delayed to maximum as the worst slew will be propagated. This will cause a negative skew for hold and would cause timing violation of same skew.
Hence, we end up having degraded setup and degraded hold. Hold being very much sensitive to skew would have 100% probability of getting affected. This problem is further aggravated if the fanout of OR gate is divided in such a way that skew is introduced between group of flops. For instance, consider the following scenario:
Here fan-out of OR gate is divided in such a way that skew is introduced between group of flops.
Our Solution focuses on the ignored data path logic which is the remaining part of multi input clock cells. The design transition of such cells when controlled can give huge benefit in terms of total number of timing violations. Our idea specifically targets the multi input cells present in the clock path. An attribute of maximum allowed transition, same as that of clock path is applied on the data pins of clock cells. This ensures good transition at the data pins prior to clock tree stage and helps in reducing the post clock tree OCV impact much in advance.
The stated idea can be easily implemented in NPI execution. The Idea is to control the transition of data pins of multi input clock cells. Later on, while routing the design, the nets on these data pins are routed with same rule as that of clock nets For instance, if clock nets are routed with Double Width Double Spacing then an attribute of Double Width Double Space is applied on these nets also. This avoids the noise impact on these pins.
Our idea is implemented at synthesis stage itself. After getting the raw Netlist and clock definitions, an automated script generates the updated constraints file and the file for routing attribute. These files are used starting from the placement stage itself. This is demonstrated in below figure:
Our idea reduces huge amount of timing violations much prior in the design cycle. It not only helps in reducing setup violations. But also resolves many hold violations. Since the solution avoids lot of violations which would have otherwise popped up because of OCV, therefore it also saves on the number of buffers used in design, hence saving on area and power as well.