Leos Kafka and Martin Danek, UTIA AV CR Ondrej Novak, Czech Technical University in Prague Prague, Czech Republic
This paper presents a technique that allows to preserve structure of a circuit according to a target technology during fault emulation in FPGA. The technique is not restricted to any target technology or FPGA emulation platform. It is compatible with fault injection techniques based both on circuit instrumentation and partial runtime reconfiguration. An extension of this technique that allows to emulate timing parameters of the circuit through an introduction of a virtual time is also proposed. An area and timing overhead due to preserving the circuit structure and parameters of basic delay elements are evaluated by experiments.
Fault simulation is an important part of the digital circuit design flow. Increase in complexity of digital circuits lengthens the simulation time. This time can be shortened to acceptable margins by the use of emulation instead of the simulation. Emulation is a method that uses programmable hardware for implementation of a tested circuit. Field programmable gate arrays (FPGAs) are used as an emulation platform most often due to their flexibility and unlimited reprogrammability.
As long as the circuit is considered as a black box during the emulation, which means that only values on circuit inputs and outputs are used for further processing, an internal implementation of the circuit is not important. A situation becomes more complex when an access to internal nodes is required. One of such cases is fault emulation. In this case, faults are injected to given nodes inside the emulated circuit so as the behavior of the faulty circuit could be evaluated. The structure of the circuit during the emulation should reflect the structure of the circuit after the final implementation. As the target technology (e.g. ASIC) differs from the emulation technology (FPGA) in general, the netlist created for the target technology differs from the netlist created for emulation. The differences in the netlists may not allow to inject and test some faults, simply because the corresponding nodes may not exist in the netlist for emulation. This paper introduces a novel emulation method that ensures that the structure of the circuit used during emulation will correspond exactly to the netlist for final implementation at the gate level.
One of the major parts of fault emulation is fault injection. Artificial faults are injected to certain nodes of the tested circuit by this process. There are two main classes of fault injection techniques: a) methods based on circuit instrumentation and b) methods based on partial runtime reconfiguration. ,  present an environment for emulation of transient faults based on circuit instrumentation. Transient faults in the sequential memory elements (flip-flops) and in any memory elements within microprocessor-based systems are considered.  presents a fault injection approach which is based on a co-operation between a simulator and an emulator. Parts of a circuit are simulated while the rest of the circuit is emulated, which should increase the controllability and observability level. Different fault models (stuck-at faults, bridging faults, etc.) are supported.  presents another fault emulation environment based on circuit instrumentation. Stuck-at faults are supported in this case. ,  presents a methodology for injecting SEUs faults into flip-flops and stuck-at faults at LUT inputs by runtime reconfiguration. ,  presents another technique for injecting stuck-at faults by partial runtime reconfiguration.
Properties of the fault injection methods mentioned above can be summarized as follows: methods based on circuit instrumentation allow injecting faults much faster than the methods based on partial reconfiguration. In fact, the time required for reconfiguration made that fault injection method hardly usable in some cases. On the other hand, the methods based on runtime reconfiguration do not require any additional FPGA resources or any modification of the tested circuit. Absence of any modifications prior implementation ensures that these methods cannot influence circuit structure and thus it cannot influence results of fault emulation. The emulation method introduced in this paper supports both types of fault injection methods.
In some cases, not only the function of the circuit, but also the timing of the circuit outputs and internal signals is required to be evaluated at reasonable level of accuracy. An optional enhancement that preserves circuit timing is also proposed in this paper.
PRESERVING CIRCUIT STRUCTURE DURING FAULT EMULATION
The novel emulation method that preserves the structure of the tested circuit is introduced in this section. As it was mentioned above, the structure of the circuit used during the fault emulation has to correspond to the structure of the circuit after the final implementation.
The design flow proposed in this paper uses two synthesis steps in order to preserve the circuit structure at the gate level. The first synthesis step defines the structure according to the target technology. The second synthesis step converts the netlist generated by the first synthesis step to the netlist suitable for emulation in FPGA. The principle of the flow is shown in Figure 1.
Let us consider that the tested circuit is described in a HDL language. The source HDL file should be the same for both the final implementation and emulation. It also means that all the modifications intended only for emulation purposes are not allowed at this stage. The source file is synthesized with the same tools and the same settings as it will be during the final implementation. Any synthesis optimizations are available at this stage.
The netlist for the target technology generated by the synthesis in the first step is used not only for final implementation in target technology, but also as a source file for the emulation flow. The netlist is translated back to the HDL format. The new HDL file (Structure HDL file) describes the structure of the circuit at the gate level.
Figure 1. Design flow for emulation that preserves the circuit structure
Since the structure has been already fixed during the first synthesis step, the Structure HDL file can be modified, provided that no primitive instances or nets are removed from the circuit. Insertion of fault injection structures is an example of such modifications. The emulation HDL file is created as a result of these modifications.
After all the required modifications, the resulting HDL file (Emulation HDL file) is synthesized for implementation in the emulation platform. The netlists of all the primitives used in the circuit are required in this step. The hierarchy of the circuit has to be preserved during this synthesis step.
The netlist for the emulation is than used in the same way as if just one synthesis step is used.
The main difference of the proposed emulation method compared to the simple methods, which use one synthesis step, is that the netlist synthesized for the target technology is used as a starting point of the emulation flow instead of the source HDL file. The main advantage of the proposed method is that the structure of the emulated circuit corresponds exactly to the structure of the circuit after the final implementation, and thus it allows to inject fault to all nodes of the circuit. The list of faults better corresponds to the real circuit. Furthermore, the method allows inserting any components for fault injection without any impact on the circuit structure. Thus both the methods of fault injection, i.e. circuit instrumentation and partial runtime reconfiguration can be used without any obstacles. The proposed method does not require any human effort. It can be fully automated, and thus it is not more difficult to use than any simpler method.
PRESERVING CIRCUIT TIMING
This section describes a method for emulation of delays within the circuit. In addition to preserving the proper structure, which was described above, another goal of the method is to emulate the timing parameters of the circuit at a sufficient level of accuracy.
There are two approaches to circuit-timing emulation. The first approach  relies on physical parameters of the FPGA fabric. This approach is not considered in this paper due to its drawbacks.
The second approach, considered in this paper, is based on virtual emulation time. The FPGA-clock period represents a constant amount of virtual time (Time Unit, TU). The real length of the FPGAclock period is not important from this point of view. The choice of TU length influences emulation accuracy and overall emulation time. The shorter the TU is, the higher accuracy of emulation can be achieved, but on the other hand the more time is needed for the emulation. A virtual period of a clock input of the emulated circuit is a parameter of the emulation experiment and it is defined as an integer multiple of TUs. It means that the physical period of the clock input of the emulated circuit is longer than physical period of FPGA-clock in corresponding ratio. Logic gates delays and sequential elements delays are properties of the target technology and they are also defined as a multiple of TUs. It means that the outputs of the gates and the sequential elements are delayed by corresponding number of FPGA-clock periods.
An example is shown in Figure 2. The period of FPGA-clock (signal FPGA_CLK) represents some basic time unit (TU) of virtual emulation time, as it was mentioned above. The length of TU is a user choice. A physical period of clock input of the emulated circuit (signal DUT_CLK) is set according to the required virtual period of the clock input of the emulated circuit and the length of the TU. In this case the length of the virtual period is equal to 5TU. Delay Tclk2q of the flip-flop output (signal DFF_Q) is set to 2TU; delay of the gate is set to 3TU. These are properties of the target technology. Since the inertial delay is to be used for logic gates, the value at the gate input has to be stable longer than gate-delay in order to be propagated from gate input (signal A_IN) to gate output (signal Y_OUT).
The delay in the emulated circuit is implemented through delay elements. These delay elements are sequential components that are able to delay an input event by a given number of FPGA-clock cycles (i.e. TUs).
There are two different delay types to emulate: an inertial delay and a transport delay. In the inertial delay model, an event on the input is propagated to the output with a given delay provided that the subsequent event do not occur before a given delay time. In the case of the transport delay model, all events at the input are propagated to the output with a given delay, without regards to the interval between two consecutive events. The first delay model is suitable for logic gates; the second for signal nets. Both of the delay models bring different requirements on the component that implements the proper delay in an emulated circuit. Schemes of the inertial and transport delay elements are shown in Figure 3.
The advantage of this method is that it does not rely on physical parameters of the FPGA fabric. Insertion of delay elements also reduces the length of a critical path within the circuit, since all delay elements contain at least one flip-flop in the datapath. The shorter critical path allows running emulation using higher FPGA-clock frequency, which reduces the time required for the emulation.
This section provides some experiments related to the proposed emulation technique. The first experiment deals with area and timing overhead of the technique that preserves just the structure. The second group of experiments deals with emulation of circuit timing. The emulation timing improvement due to insertion of delay elements and implementation results for delay elements are mentioned.
Overhead of the Proposed Emulation Technique
A goal of the first experiment was to evaluate an overhead of the proposed emulation technique compared to the simple emulation flow that does not preserve circuit structure. First, the source VHDL description of the emulated circuit was directly synthesized for implementation in FPGA. It is used as a reference. Than, the same VHDL description was processed by the proposed emulation flow (see Figure 1). In this case, the two synthesis steps were used. The first synthesis step was performed by tools for the final technology (ASIC). The second synthesis step prepares the netlist for implementation in FPGA so as it could be emulated. The overhead in terms of size (LUTs) and maximum path delay was then evaluated.
We have used these settings: small combinational MCNC benchmarks as a test circuit, an ASIC as a target technology, Xilinx Virtex-II PRO as an emulation technology, the Synopsis synthesis tool for the first synthesis step, and the Xilinx XST tool for the second synthesis step of the proposed emulation technique.
Table 1 shows an overhead of the proposed method. Column 2 shows sizes of the netlists created as a result of the proposed emulation technique. Column 3 shows sizes of the reference circuits. Column 4 shows the area overhead.
Figure 2. Principle of the virtual time in FPGA
Figure 3. Implementation of the delay elements
Column 5 shows the maximum path delays of the netlist created as a result of the proposed emulation technique. Column 6 shows the maximum path delays of the reference circuit. The last column shows the ratios of these two values.
The area overhead is 61% on average. It is because of the differences between the original target technology (ASIC) and the emulation technology (Xilinx FPGA). In the ASIC technology a larger amount of smaller primitives is used. In the FPGA each primitive of the netlist is implemented independently by one or more 4-input LUTs.
The maximum path delay in the netlist created by the proposed emulation method was about 2.2 times longer on average compared to the reference circuit. It is due to more levels of logic primitives caused by a larger amount of smaller primitives on the critical path. The maximum path delay influences the maximal frequency of the FPGA clock during the emulation and thus the overall time needed for the emulation.
Emulation of circuit timing
The second experiment deals with the emulation of circuit timing. The goal of the experiment was to evaluate the maximum path delay after the insertion of delay elements. The VHDL file corresponding to the netlist generated by the first synthesis step was modified in such a way that one instance of a simple delay element was placed at the output of each primitive (gate). Results of the experiment are in Table 2. Column 2 shows the maximum path delays of circuit modified by insertion of delay elements. Column 3 shows the maximum path delays of the netlist that preserves just the structure. The last column shows the ratios of these two values.
The maximum path delay was reduced to 41% on average compared to the maximum path delay in the netlist that preserves the structure only. It means that the higher FPGA-clock frequency can be used during the emulation which reduces the time needed for the emulation.
This experiment deals with comparison of the different delay elements. The delay elements described in previous text were considered, i.e. the delay elements based on binary counters and LFSRs for inertial delays and the delay element based on Shift registers for transport delays. The results are in Table 3. The elements of length in range from 7 to 1023 were considered. Its size in terms of LUTs and flip-flops and the maximum path delay is shown.
Table 1. Overhead for the proposed emulation method
|Circuit ||Area [LUTs] ||Maximum path delay [ns] |
|EDIF ||VHDL ||Overhead ||EDIF ||VHDL ||Ratio |
|al2 ||87 ||55 ||58% ||4.56 ||2.05 ||2.22 |
|alu1 ||19 ||8 ||138% ||2.07 ||0.46 ||4.55 |
|alu2 ||49 ||27 ||81% ||5.37 ||3.20 ||1.67 |
|alu3 ||47 ||28 ||68% ||6.57 ||3.95 ||1.67 |
|apla ||60 ||40 ||50% ||5.33 ||3.84 ||1.39 |
|b11 ||54 ||42 ||29% ||5.22 ||2.01 ||2.60 |
|br1 ||47 ||44 ||7% ||6.47 ||4.86 ||1.33 |
Table 2 Emulation timing improvement caused by insertion of delay elements
|Circuit ||Maximum |
|Maximum path delay [ns] |
|al2 ||1.81 ||4.56 ||0.40 |
|alu1 ||1.52 ||2.07 ||0.73 |
|alu2 ||2.38 ||5.37 ||0.44 |
|alu3 ||1.89 ||6.57 ||0.29 |
|apla ||1.95 ||5.33 ||0.37 |
|b11 ||1.80 ||5.22 ||0.34 |
|br1 ||1.97 ||6.47 ||0.30 |
The delay elements based on shift registers use one slice flip-flop in all cases. The number of LUTs it consumes depends linearly on the delay element length. The delay element is composed of SRL16 primitives. Since delay elements longer than 16 are composed of several SRL16 connected in series, the timing is constant for all lengths above 16.
This paper has described the method of fault emulation that preserves the structure of the tested circuit according to the target technology. The method shields the properties of the emulation platform so as to guarantee the properties of the target technology. It also allows using both types of fault injection techniques, i.e. the techniques based on partial runtime reconfiguration and the techniques based on circuit instrumentation. An enhancement that allows preserving not only the structure but also the timing information during emulation is proposed.
An area and timing overhead due to preserving the circuit structure was evaluated by experiments on small combinational circuits. The use of the proposed flow leads to an area overhead of 61% and 2.2 times longer critical path. The second group of experiments dealt with emulation of circuit timing. First, it was shown that the length of the critical path was significantly reduced by insertion of delay elements. Second, the size and timing of different delay elements were evaluated so as the overall overhead of timing emulation could be estimated.
Table 3 - Delay elements : Size and maximum path delay
|Length ||7 ||15 ||31 ||63 ||127 ||255 ||511 ||1023 |
|Size [LUTs] ||10 ||11 ||14 ||15 ||16 ||16 ||19 ||20 |
|Size [FFs] ||8 ||9 ||10 ||11 ||12 ||13 ||14 ||15 |
|Delay [ns] ||2.37 ||2.63 ||3.06 ||3.06 ||3.09 ||3.60 ||3.82 ||3.82 |
|Size [LUTs] ||8 ||8 ||9 ||9 ||10 ||9 ||11 ||11 |
|Size [FFs] ||8 ||9 ||10 ||11 ||12 ||13 ||14 ||15 |
|Delay [ns] ||2.38 ||2.59 ||3.02 ||3.00 ||3.07 ||3.60 ||3.82 ||3.82 |
|Shift Register |
|Size [LUTs] ||1 ||1 ||2 ||4 ||8 ||16 ||32 ||64 |
|Size [FFs] ||1 ||1 ||1 ||1 ||1 ||1 ||1 ||1 |
|Delay [ns] ||3.29 ||3.29 ||3.47 ||3.47 ||3.47 ||3.47 ||3.47 ||3.47 |
This work was supported by the National Research and Development Policy of the Czech Republic under project no. 1QS108040510.
 P. Civera, L. Macchiarulo, M. Rebaudengo, M. S. Reorda, and M. Violante, “Exploiting FPGA-based techniques for fault injection campaigns on vlsi circuits,” in DFT ’01: Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’01). Washington, DC, USA: IEEE Computer Society, 2001, pp. 250–258.
 P. Civera, L. Macchiarulo, M. Rebaudengo, M. S. R. M., and M. Violante, “New techniques for efficiently assessing reliability of SOCs, ”Microelectronics Journal, vol. 34, pp. 53–61, 1 January 2003.
 A. Ejlali, S. G. Miremadi, H. Zarandi, G. Asadi, and S. B. Sarmadi, “A hybrid fault injection approach based on simulation and emulation cooperation,” vol. 00. Los Alamitos, CA, USA: IEEE Computer Society, 2003, pp. 479–488.
 J. Raik, P. Ellervee, V. Tihhomirov, and R. Ubar, “Improved fault emulation for synchronous sequential circuits,” in DSD ’05: Proceedings of the 8th Euromicro Conference on Digital System Design. Washington, DC, USA: IEEE Computer Society, 2005, pp. 72–78.
 R. Leveugle, L. Antoni, and B. Feher, “Dependability analysis: A new application for run-time reconfiguration,” in IPDPS ’03: Proceedings of the 17th International Symposium on Parallel and Distributed Processing. Washington, DC, USA: IEEE Computer Society, 2003.
 “Using run-time reconfiguration for fault injection applications,” in IMTC ’01: Proceedings of the 18th IEEE Instrumentation and Measurement Technology Conference, vol. 3, 2001, pp. 1773–1777.
 P. Kenterlis, N. Kranitis, A. Paschalis, D. Gizopoulos, and M. Psarakis, “A low-cost SEU fault emulation platform for SRAM-based FPGAs,” in IOLTS ’06: Proceedings of the 12th IEEE International Symposium on On-Line Testing. Washington, DC, USA: IEEE Computer Society, 2006, pp. 235–241.
 A. Parreira, J. P. Teixeira, and M. B. Santos, “Built-in self-test preparation in FPGAs,” in DDECS ’04: Proceedings of the 7th IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems, 2004.
 D. de Andres, J. C. Ruiz, D. Gil, and P. Gil, “Runtime reconfiguration for emulating transient faults in vlsi systems,” in DSN ’06: Proceedings of the International Conference on Dependable Systems and Networks (DSN’06). Washington, DC, USA: IEEE Computer Society, 2006, pp. 291–300.