SoC test reels out of control
By Ron Wilson, EE Times
May 13, 2002 (12:16 p.m. EST)
System-level IC testing, already complicated by increasing chip capacity and the growing use of third-party intellectual property (IP), is now being threatened from another quarter. A proliferation of new failure modes being observed in very deep-submicron processes may force design managers and OEM executives to become directly involved in system test architecture at a level never before experienced by most companies.
Most of the problems with system-on-chip (SoC) testing are both easily anticipated and well-understood by leading-edge design teams. Increasing numbers of scan flip-flops lengthen the I/O time to move data in and out of scan chains. Multiple I/O points, parallel loading of chains, data compression and higher I/O rates are all being applied to this issue.
At a more-complex level, the growing attempts to reuse IP are creating less-tractable problems. When IP comes from a variety of sources, it often comes with a variety of test meth odologies. The IP vendor's approach frequently has been modest at best. Simple scan chains and functional test vector sets are typical of today's complex IP offerings, test specialists say. Serious thought is only rarely given to design-for-test (DFT), or even to structural rather than functional test vectors.
Many users of complex IP are hardly more sophisticated. One test expert characterized today's typical SoC test methodology thus: "Just connect the scan chains together and run one huge, flat automatic test pattern generation." But such an approach is usually wasteful and quickly becomes intractable, even when the IP blocks are well-understood.
So the test strategy for the SoC becomes a mixture of one huge automatically generated scan chain for the soft IP, built-in-self-test (BIST) interfaces for the memory instances and whatever the IP vendor provided for the hard blocks. Cost, coverage and accuracy issues grow geometrically with the complexity of the design.
Against that background of a problem out of control, the situation is about to get markedly worse, according to sources from across the industry. The problems faced by current SoC design teams are still based on the concept that most faults are stuck-at faults. But that simplifying assumption is about to be erased.
The problem is in the physics and electronics of new processes. For example, aluminum metallization has defects typically related to particle contamination that usually present themselves as a node stuck at a particular voltage level: the stuck-at fault. Copper interconnect formed through damascene processing presents a whole new repertoire of problems, involving the pattern-dependent thinning of the copper.
Subwavelength lithography is creating its own artifacts, including microbridges between interconnect lines and unreliable formation of vias. And the agg ressive geometries are creating capacitive- and inductive-coupling issues that are virtually incapable of being modeled.
Those problems appear to the test engineer not as stuck-at nodes but as whole new categories of symptoms: delay faults, transition faults and nodes that simply behave in unfathomable ways. Many of those faults depend on data patterns in neighboring circuitry.
Testing in conventional ways for all these failure modes will be prohibitively complex. Test engineers will have to find some way of reducing the space that the test methodology must explore.
One approach strongly advocated by some designers is to test the SoC as a system with an intended function, rather than as a mere assembly of IP cores. Some experts talk of a reversal of direction, away from structural test and back to functional test but at the system level, not the core level.
When an SoC is used in a system, generally only a small subset of the possible combinations of states of all the cores in the chip is used. That leaves many states that don't need to be tested, because they will never be entered by the system, thus providing a potentially huge savings in test effort.
Unfortunately, visibility of the unused states comes at the system level, not at the chip level. It requires analysis of inputs and outputs as transactions in the end application, not in terms of possible or allowable inputs to the IP cores on the chip. And often, that type of application-based information is not available to the test development team.
Even if the function of the system is understood, it may still prove impossible to explore all the possible faults that a 100-nm circuit can exhibit. Designers are reporting that they can only come up with test strategies once they have detailed information from process engineers about the nature and probability of faults for a particular design in a given process. Factor in the faults related to subwavelength techniques, and test strategy may become dependent on a particular mask s et.
But even the best information from system architects and process engineers is approximate. The only reliable data about the kinds of failures an SoC actually exhibits comes from analysis of field failures. That requires finding failures in the field, saving the failed modules, getting them back to recover the SoC and submitting the chip to failure analysis.
At a vertically integrated OEM, where system design, chip design, field service and failure analysis all report to one executive, this can be possible, although it is rarely practiced. In today's disaggregated environment, in which these organizations are likely to be distributed among different companies, the problems are sobering.
Example in automotive
One significant exception is the close partnerships that automobile manufacturers form with their suppliers. "One of our greatest assets in test is that we have automotive customers," said Christine King, president and chief executive officer of AMI Semiconductor. "With most customers, a chip fails in the field and somebody throws the whole module away. You never get to see it. With the auto industry, you get every failure back. And the customer is very supportive of your interest in finding the failure.
"Because of these customers, we have come a long way toward understanding the failure modes that actually occur in our chips," King said. "We have refined our failure analysis to the point that we are now using focused-ion-beam systems, voltage contrast analysis and other laboratory techniques to find out what is going on electrically inside failed chips. And that information goes right back into test strategy."
King's experience is seconded by other ASIC executives. And it is predictive of the only evident solution to the test crisis. Test strategy and, hence, DFT insertion and even chip architecture will be influenced by a gigantic iterative loop that begins with capturing failed systems in the field, performing failure analysis on the systems-on-chip an d modifying the chip and the test strategy as necessary to minimize and detect the failure mode.
Particularly for fabless chip companies, that will require an unprecedented level of cooperation across a number of corporate boundaries, and between organizations that rarely speak to each other. But it may be the only way forward.