On-time Finish Rests With Multiple Clocks

On-time Finish Rests With Multiple Clocks
By Richard Illman and Greg Aldrich, Integrated System Design
April 2, 2002 (12:24 p.m. EST)
URL: http://www.eetimes.com/story/OEG20020329S0056

Today's leading-edge system-on-chip (SoC) designs typically have multiple clock domains and, in many cases, multiple internally generated clocks. In test mode, those clocks may be combined into one, or they may all be brought out to separate primary input pins for test. One of the biggest test problems multiple clocks pose when implementing design-for-test (DFT) and generating test patterns is unpredictable clock skew, which can cause test problems in manufacturing. Clock skew can cause difficulties shifting data through the scan chains and can lead to unreliable test patterns.

These issues need to be taken into consideration during the design stage, since, ultimately, the decisions on how test clocks are designed can have a significant impact on automatic test pattern generation (ATPG). This article discusses the trade-offs among various clock test methodology scenarios and offers food for thought on how to proceed in each scenario.

First and f oremost, clock trees are designed with functional operation in mind. Clock control logic may be designed to pulse various clocks in a specific sequence or to ensure that certain clocks are not pulsed at the same time. During functional operation, internal clock generators may also manage internal clock skew in order to ensure proper functional operation. Alternatively, resynchronization logic may be used to control data transfer between clock domains that operate truly asynchronously. During a scan-based manufacturing test, however, the clocks might be operated in a different way, since in test mode the clocks are delivered by the automatic test equipment (ATE) and internal clock generators and controls are bypassed.

It is important to keep in mind that:

When data is shifted through the scan chains, all clocks are pulsed at the same frequency and at the same time. This means that if multiple clocks are used within the same scan chain, there is a danger of clock skew between the clock domains during sh ift.

During capture, one or more clocks might be pulsed at the same time. The ATPG tool may generate vectors that exercise logic in a manner that does not occur in functional mode.

When a design has multiple clocks in test mode, clock skew can occur between domains. The problem can be separated into two issues: Clock skew during shift and clock skew during capture. To minimize skew during shift, all scan chains should be ordered such that all flops clocked by one clock domain are grouped together. That minimizes the locations where clock skew can create a problem. To avoid skew completely where the clock domains cross, a lockup latch can be inserted (Fig. 1).

The lockup latch delays the data for half a clock cycle, from the rising to the falling edge, and thus provides a high tolerance to skew between domains.

The above methodology only solves the skew problem for scan-shifting operation. During manufacturing test, after the scan chains are loaded, the device is taken out of shift mode and a "capture clock" is issued to capture the response of the test pattern. During capture, there can be multiple paths between the clock domains. Since there can be a path from both domain 1 to domain 2 and vice versa, simply skewing the clock inputs might not help. The most conservative approach, which is the one traditionally chosen by most ATPG tools, is to pulse only one clock per pattern during the capture cycle. Alternate approaches for handling multiple clocks during test and test pattern generation will be discussed later.

Single or multiple test clocks?
Since clocks must be accessible from primary input pins during test, there are really only two alternatives for testing designs with multiple functional clocks:

Use a single, synchronous clock in test mode. This enables a very compact test vector set but introduces extra gating into the clock trees, making timing closure more difficult and exacerbating the problems of peak power consumption during test.

Maintain multiple asynchrono us clocks in test mode. This is easier to implement but could result in larger test sets.

By using a single clock in test mode, the internal clock signals are bypassed to one external clock signal. From the tester and test generation point of view, the design has one clock. In general, there are both advantages and disadvantages to using this alternative.

Among the advantages of using a single clock in test mode are that only one pin needs to be used as a dedicated test-mode clock pin, that the method ensures the most compact test pattern set and that it provides the shortest ATPG run-times. Since the design has only one clock in test mode, the test generation effort and, therefore, run-times are minimized.

There are some disadvantages as well. The methodology requires very careful clock design and analysis. For mux-DFF designs, the clock tree needs to be synthesized separately for test mode. Even though Clk1 and Clk2 are correctly skewed in functional mode, they will now be clocked at the same time, and clock skew can cause problems during both shift and capture.

Also, scan chain reordering is more complicated since ordering must take into account the clock domains for proper shift operation. What's more, little flexibility is left in resolving potential problems. If clock skew occurs, it is difficult to resolve without modifying the design or losing test coverage.

The second test approach with multiple test clocks is to use one dedicated pin per clock domain. In functional mode, multiple clocks are generated internally. In test mode, each internal clock has a different clock pin. From the ATPG tool's point of view, the design has multiple clocks. All clocks are still clocked at the same time during shift, but the ATPG tool is now free to handle the clock domains in different ways during the capture cycle.

There are clear advantages to using multiple clocks. First, it is the safest approach, since there is no separate "test mode only" clock tree. As long as there are no clock skew problems in functional mode, there should not be any in test mode.

Also, different ATPG methods can be used in generating the tests, and there is flexibility in pulsing each clock separately or at the same time.

Using multiple clocks allows peak power consumption to be reduced because not all clocks are active at the same time during the capture cycles. If the scan chains have lockup latches to prevent skew between domains, it is also possible to "stagger" the clocks during shift mode and, thereby, reduce peak power consumption during the shift operation.

Of course, there are disadvantages to this scheme, too. If all clocks were generated internally, then some additional routing and more pins would need to be dedicated as clock pins during test mode. Nonetheless, it is possible to share test clock pins with functional pins. Such sharin g can reduce test coverage, because the pins need to be operated as clocks and not as regular input pins during test, but the coverage reduction is usually insignificant.

Multiple clocks also increase pattern count. Since each pattern will probably not use all of the clocks, the pattern count will usually be larger. Advanced techniques for ATPG that can limit the increase in patterns are discussed later. However, solutions that minimize the increase in pattern count will increase the ATPG run-times.

Standard scan design
The advantages and disadvantages assume a standard mux-DFF style scan design.

Using an LSSD-based scan approach rather than mux-DFF solves many issues related to scan chain shifting. An LSSD scan cell requires three clocks: a system clocks and two nonoverlapping clocks that are used during shift (Fig. 2). This LSSD cell is used to replace a nonscan latch and is suitable for latch-based designs.

< /CENTER>

Similar cells exist for DFF-based designs. Such LSSD cells have an edge-triggered system clock but use the same nonoverlapping clocks during shift as regular LSSD. Therefore, it is not necessary to have a latch-based design before scan in order to use an LSSD approach. This type of scan design eliminates the skew issues during shift, since two nonoverlapping clocks are used. However, it does not eliminate potential skew problems during capture.

Whether the design uses one clock or multiple clocks during test mode, designers must ensure that clock skew doesn't cause problems during shift. As discussed, this can be accomplished by using an LSSD scan methodology or through correct ordering of the scan chains and usage of lockup latches.

If multiple clock domains are combined into one test clock, then careful attention also must be paid to the clock tree design to ensure that there will be no clock skew problems during the capture cycle. Designs with multiple test clocks can make use of seve ral ATPG techniques to ensure that the final test patterns will work properly in manufacturing regardless of any skew issues between clock domains.

The traditional way to ensure that clock skew does not cause problems in the capture cycle is to pulse only one clock per pattern. During shift, all clocks are pulsed at the same time, but only one clock is selected per pattern. In this example, each scan chain has two scan cells, and there is a total of four scan clocks.

This is the easiest way to avoid clock skew during capture. Little ATPG effort is needed, which results in short run-times. The trade-off is a high pattern count: For a design with 10 clocks, one can experience up to 10 times more patterns than if the design had one clock.

Another approach is first to route all clocks to separate inputs in test mode and then analyze which clock domains are independent (i.e., which ones have no functional paths between them).

If the ATPG tool is capable of performing the analysis, then it can treat independent clock domains as one. That will yield the same effect as using one pin for these clocks. In the example shown in Fig. 3, analysis shows that there are no functional paths between clock domains 1 and 3. Therefore, for Pattern 1, clocks TClk1 and TClk3 can be pulsed at the same time. Since there is interaction between other domains, clock TClk2 is pulsed alone for Pattern 2.

A low pattern count is generated, and if there is a problem between certain domains, one can use a different approach without changing the hardware. But this approach requires analysis to determine which clocks can be pulsed at the same time without causing problems. Some ATPG tools automate this analysis.

Pulsing clocks sequentially results in the most compact pattern set without the risk of clock skew causing problems in the capture cycle. The capture takes place over multiple cycles. Only one clock is pulsed per cycle. Since the sam e time plate is used for all cycles, no additional time plates or other settings are required. This method is used by the Mentor Graphics FastScan tool's multiclock compression.

Expensive advantages
The advantages include a reduced pattern count (compared with pulsing one clock per pattern); very good compression results, with a sequential depth of only 2 or 3, even for designs with many clocks; and no risk of clock skew during capture. These advantages do come at the expense of an increased ATPG run-time.

Multiclock compression resolves the test set size dilemma that using multiple clocks in test mode usually creates while avoiding the issues of clock skew. One example of an SoC-type design shows how.

The design, provided by Tality, has a large number of clock domains-14-and the number of flops associated with each clock varies dramatically. There is one clock with more than 5 kflops, there's another with 2k and a couple with approximately 1k. The remaining clocks have between 20 and 200 flops each.

The accompanying graph (Fig. 4) plots the number of vectors produced against the CPU time for several ATPG runs. The "synchronous" curve represents a purely synchronous implementation, in which all clocks are switched to a single source. The different points on the curve represent increased effort levels during dynamic compression, which increases run-time but can reduce the number of vectors.

The "multiclock" curve represents an asynchronous implementation of the design in which clock sources remain separate to avoid any possible timing hazards. The first (left-most) point represents conventional ATPG with only a single capture cycle for each test vector. The successive points are for runs that allow an increasing number of capture clocks for each test pattern. This increases the CPU time but decreases the final vector count. The final pattern count is very close to what is possible with a single clock.

We recommend that for mux-DFF based designs, the best solution for designs wi th multiple clock domains is to let each internal clock domain have a clock pin in test mode. That will provide the most options for test pattern generation and eliminate the need to design a special clock tree just for test. To achieve the most compact pattern set, clock domain analysis should be performed to determine which domains do not interact, so that ATPG can pulse these clocks simultaneously when patterns are generated.

Finally, the remaining clocks should be pulsed sequentially using a multiclock compression technique. These methods, combined, will result in the safest approach to a minimized pattern set.

---

Richard Illman, chief consulting engineer at Tality Corp. (Livingston, Scotland), is a graduate of Scotland's University of Hull. Greg Aldrich is product marketing manager at Mentor Graphics Corp. (Beaverton, Ore.). He holds a BEE degree from the University of IIlinois.

http://www.isdmag.com