| The fact that designs are big and complex is by now a truism. That this fact makes ICs hard to design and verify has become a platitude. The whole EDA industry, it seems, is in business to tackle this challenge. New tools purporting to automate system-level design and close the loop between design and implementation are legion. But to grow, EDA needs to expand beyond its traditional boundaries. |
Getting too little attention is the reality that the challenges of design and verification don't stop when silicon comes back from the fab. In fact, the problem of understanding the design, its behavior, and the causes of that behavior are vastly compounded by the limited visibility afforded by real silicon, which rarely works right.
According to Gartner's Feb. 2004 report "Conservative Times, Conservative Designs in EDA: User Wants and Needs," it takes on average of 2.5 re-spins to finish a complex design. Moreover, while the time from project start to first silicon continues to fall, "higher expected gate counts will translate to longer prototype-to-volume-production times, according to our survey respondents."
All flaws in a running chip manifest themselves as logical "ones" that should be zeros, or vice-versa. No matter whether the wrong value is due to a design error, timing problem, or manufacturing defect, the detective work required to track down the cause can get very tricky. It's hard enough to perform this sleuthing in the simulation domain and much, much harder post-silicon.
Tracking down the cause of a wrong value goes something like this: find the driver of the signal, look at the values of its inputs, and figure out which input path is most likely causing the problem. Continue tracing back this way until either a) the cause becomes clear; b) you reach a dead end and have to go back and trace another path; or c) you lose track of what's going on and have to start over.
In simulation, where the results for every signal are readily observed, and repeating the scenario (with variations) requires little more than a few keystrokes, this is pretty straightforward, albeit time consuming. Modern HDL-based debuggers make it easier by linking waveforms, source code, and schematics so that tracing back each level takes just a few clicks. But fast-forward to the first-silicon debug phase, and things get a lot hairier. Tracing back from an observed result to its buried cause can be truly daunting when the only way to observe signal values is to scan out the registers at every cycle, or use a million-dollar e-beam or laser probing machine.
How do you get the data from the tester or prototype system into the waveform display? How do you get the values of the signals or registers that are not in the scan chain? How do you play what-if games with the potential causes? How do you correlate what you're seeing with the RTL design?
Worse yet, this work takes place in a pressure-cooker atmosphere brought about by the entire organization being anxious to ship the chip and start making money from it. I envision the CEO of a fabless startup standing outside a glass-windowed systems lab, literally checking his watch as he observes the design team sweating out the first-silicon debug!
Up to now, precious little has been done to attack this costly manifestation of the complexity problem. But a new theme is emerging, as evidenced by the recent advent of a conference addressing the issues involved. The 1st IEEE International Workshop on Silicon Debug and Diagnosis held last May 26-27 in Corsica, France has as its mission and objective "to consider all issues related to debug and diagnosis of circuits and systems — from prototype bring-up to volume production," according to its official web site.
The presentations at this workshop parallel our experience in the industry. Today users rely on ad-hoc methods to pull data from their test setup into their RTL and gate-level debug environments. And they spend weeks laboriously figuring out the causes of the bugs that keep them from shipping in volume.
Proposed "design for debug" approaches will increase the controllability and observability of silicon test by introducing some extra logic into the design. And low cost testers are being built to increase the observability of silicon states. Some help will come from automatic test pattern generation (ATPG) tools, which build fault dictionaries when they generate tests. These tools are being enhanced to provide simpler ways to get from the observed results to their causes typically based on simple fault models. But the range of problems ATPG tools solve will remain limited.
What's really needed are specialized silicon debug and diagnosis tools focused on speeding the search for the causes of bugs in prototype chips and for diagnosing the complex faults that limit yield. In most cases, the chip's designers need to work closely with test engineers to locate and isolate the causes of a mismatch between simulation and silicon.
As design processes proceed into nanometer technology, the yield will be lower and the faults may be caused by logic, layout, timing, modeling, and manufacturing. A good silicon debug engineer needs well rounded knowledge in logic, physical design, testing, and process technology, and tools that process data from and correlate these abstractions.
Building silicon debug tools on top of existing RTL/gate debug solutions will carry proven techniques, technology and experience from RTL and gate-level debug into the post-silicon domain. In addition, these tools will require new technology that brings data acquired from silicon running on testers or in validation boards into the familiar debug environment and fills in the missing values that can't be acquired from the chip.
Powerful formal-based causal analysis must augment current debug capabilities, and must be backed up by new ways to intelligently guide the tracing of causes and quickly map details between the design and its implementation. The resulting solutions will tame the growing problem of profit-eating post-silicon debug delays.
Scott Sandler is president and CEO of Novas Software, Inc.