A Method and Approach for Fast and Efficient Debugging at Emulation Level
By Naveen Tiwari & Nitin Jaiswal,
Samsung India Software Operation Private Limited
Bangalore, Karnataka, India
Abstract :
Advancement of technology has transformed big and complex circuit boards into small and simple Integrated Chips (ICs). ICs have surpassed circuit boards in every field. Be it their small size, lower power consumption, low cost, higher speed and reliability. Growing technology and shrinking size adds complexity at every stage of IC lifecycle, starting from Design to Tapeout stage. Multimillion Gate Design requires high level of verification effort. Simulation based Functional Verification is slower and does not guarantee full system level verification. Compared to Functional Simulation, Emulation based verification cum validation is faster and guarantees full system level validation. Emulation addresses real time situations which are beyond the scope of functional simulation. Emulation has its own limitations and disadvantages. Debugging is difficult as well as time consuming. Unlike simulation, lack of control and visibility of internal states of the signals, transforms debugging into a deadening, time consuming and difficult task. In this paper, we describe a method and approach which all together decreases the Debug Cycle. The paper, too, talks about the Architectural Changes to meet the requirements.
I. INTRODUCTION
In recent years, SoC verification complexity has increased considerably. Simulation was supposed to be the best way for design verification. Bigger and complex design leads to the increase in simulation time. Simulation based verification does not cover System level verification. . Standard Technique based on Simulation started falling behind. Emulation based Functional verification became popular owing to its inherent advantages. Emulation was faster and even covered full system level verification, replicated real-time challenges and helped in faster identification of timing errors. Emulation has its own disadvantages. Poor visibility of internal states makes debugging tough [1].
Figure 1, gives a brief a idea of a Emulation based Functional Verification Flow [2]. It has been divided into 4 stages At Stage A, necessary modifications are done in ASIC RTL so that it can be ported on the FPGA of the Emulation System. ASIC Memory Blocks are replaced with FPGA Memory Blocks. Constraint file is created. It contains Timing, Placement, Routing, Mapping, Synthesis, Grouping, Logical and Physical Constraints [2] [3].
Fig.1 Functional Verification Flow
At Stage B, a FPGA Synthesis Tool synthesizes the modified RTL. It outputs a Logic Level Implementations generated through the optimization of the data path, memory, and controller components, individually or in a unified manner, through mapping to gate-level component libraries and limited resource sharing. In case of Synthesis Error, update the RTL and repeat the Synthesis process. Placement and Routing Tool, work on the Synthesized Netlist, does Translatiton, Mapping Placement and Routing of the design. This optimization is done at Stage C. Review the generated reports for Translate, Map, Place and Route, and Timing results. In case of failures, change properties, constraints, and RTL source as necessary, then re-synthesize and re-implement the design. Repeat this process until design requirements are met. This stage ends with the generation of FPGA Programming File.
At Stage D, we program the target FPGA Device with the obtained file. Test Vectors are applied to the Target Design and results are compared with standard results. In case of a Error, we have debug it out. Debugging has never been an easy task. It requires experience and skill and time as well. Simulation based debugging is easier when compared with Emulation. Poor visibility of internal signals is a major problem in FPGA debugging. Bigger designs are more difficult to debug. To get closer to the failure point, we need to analyse the behaviour of the signals using an external Logic Analyzer. Selection of signals is limited by the number of debug points available on the Emulation Base Board. Engineer need to identify signals and has to route them to the top level, repeating the Process A (Fig. 1). Bigger the design, longer the debugging cycle and longer the time consumed. Xilinx, a FPGA vendor, has ChipScope™ Pro integrated logic analyser for FPGA as a board and system-level diagnostic tool [4]. It employs three different cores for debugging purpose i.e Integrated Logic Analyzer (ILA), Integrated Control (ICON) and Virtual Input-Output (VIO). The main limitation of this methodology is the size of the Block RAM needed to store the captured samples in big slot time. In case of bigger design, FPGA resource utilization level itself is very high for ILA, ICON and VIO core, leaving little for the main design.
In this paper we propose a method that enables easy and fast method to locate failure point. The method too brings down the number of iteration (of Process A (refer Fig. 1)) to locate the failure point. The architecture described consumes less amount of FPGA resources, providing more for the original design. Parameterized architecture makes design reusable.
Fig.2 Basic Design Architecture
Figure 2, describes a very basic and low level internal architecture of a design. It consists of a Top Level Design module with a Bus Interface. The Top Level Module consists of Sub Blocks (A, B, C and D) and Main Controller Block together with Special Function Register (SFR) set. SFR is a programmable block. Programmed SFR values control the flow as well as functionality of the design.
A functionally verified Sub Block module behaves abnormally when the control signals from the adjacent modules are not as per as the module requirements. Some of them are shown in Fig. 3, Fig. 4 and Fig. 5. There can be many more such situations.
Figure 3 defines a case where a Control Signal SIG should only be high for 3 Clock Cycles. An Error condition is observed in case it is lesser or greater than 3 Clock Cycles.
Fig.3 Pulse Width Failure
Figure 4 describes a case where the minimum width of the Control Signal SIG should be at least 3 Clock Cycles. An Error Condition is observed in case the width is lesser than 3 Clock Cycles.
Fig.4 Minimum Width Failure
Figure 5 shows a waveform where two signals SIG_A and SIG_B are inter-related. SIG_B should be a single Clock Cycle delayed version of SIG_A. The violations are detected at both rising as well as falling edges. An Error occurs whenever either Rising Edge or Falling Edge does not meet one clock cycle latency criteria.
Fig.5 Inter-Related Signal Failure
In the proposed architecture, we append SFR set with DE-IF logic and Sub Blocks with Wrapper Logic (W) (Ref. Figure 6).
Fig.6 Proposed Design Architecture
We will describe these two blocks individually.
A. Wrapper Logic
Wrapper Logic is a piece of Synthesizable and Parameterized RTL code [5]. The inputs are control signals, to be monitored and it outputs Error Signal. The number of Input as well as Output Signals is proportional to the functionality. Internally, it has Checker logic which monitors the Input Control Signals and outputs Error Signal, whenever it finds a violation on the control signals (Refer Figure 7). Checker logic is very specific to the requirement. In the previous section, we have explained three different failures condition i.e Pulse Width Failure, Minimum Width Failure and Inter-Related Signal Failure. Checker logic is more or less like Assertions used in Functional Verification [6] [7]. Since, it is parameterized RTL, it is re-usable and can be extended for more checks by changing the parameter values. A sample RTL code for Width Check Logic has been appended at the end of the paper.
Fig.7 Wrapper Logic Architecture
B. DE-IF Block
DE-IF (DEbug InterFace) Block is appended to SFR set of the original design (Refer Fig. 2). Debug Enable SFR set and Debug Status SFR set form the part of DE-IF Architecture (Refer Fig. 8).
Fig.8 DE-IF Architecture
Debug Enable SFR set is a sort of Enable/Disable Register. Host/Processor can perform Read/Write (Read/Write) operations on it. Each bit of this Register either enables or disables the corresponding mapped Wrapper Logic. The values programmed into it, determines the validity of corresponding Debug Status SFR values. The Interrelated internal architecture is shown in Figure 9. Debug Status can be Read/Write(R/W) internally by the Main Controller Block (Refer Fig. 6). Host/Processor can read this SFR Set and can determine the failing Sub Blocks [8].
III. MODE OF OPERATION
DE-IF Block and Wrapper Logic, implemented in RTL [5], when integrated with the Basic Design (Refer Fig. 2), transforms it into the Proposed Design Architecture (Refer Fig. 6). The transformed Architecture goes through Stage A, B and C (Refer Fig. 1). The FPGA programming file is programmed into the FPGA. Host/Processor drives the design by programming the SFR through BUS Interface.
Main Controller Block drives control signals to the Sub Blocks A, B, C and D (refer Fig. 6). The control signals to be monitored go to Wrapper Logic present beside the Sub Blocks. Simultaneously, control signals go to internal logical blocks of the Sub Blocks. Wrapper Logic continuously monitors the incoming control signals. On seeing a violation on the control signals, Wrapper Block drives output Error signal high.
Fig.9 Internal Controller Architecture
The Error signal internally gets ANDed with its corresponding value in Debug Enable SFR set and is reflected in the corresponding Debug Status SFR set (Refer Fig. 9).
Once Debug Status SFR set gets updated, a Interrupt is sent to Host/Processor. It reads the Debug Status SFR set. The value in Debug Status SFR set automatically points to the corresponding Sub Blocks, failure point. Once the Sub Block is identified, user can map the related control signals to the available debug points by modifying RTL as well as Constraint files. Process A (Refer Fig. 1) has to be repeated and the behaviour of the control signal is monitored to determine the exact reason for the failure.
IV. EXPERIMENTAL RESULTS
Proposed design (Refer Fig. 6), was simulated in ModelSim® from Mentor Graphics [9]. The Simulation Result for various Error condition shown in Fig. 3, Fig. 4, Fig. 5 is shown in Figure 10, Figure 11 and Figure 12 respectively.
With reference to Fig. 10, we can see ERROR signal going high, whenever the control signal SIG width does not meet the 3 Clock Cycles condition.
Similarly, in Fig. 11 too, we see ERROR going high, whenever the minimum pulse width criteria of 3 Clock Cycles on control signal SIG is violated.
Fig. 12, shows the violation between SIG_A and SIG_B. ERROR_RED goes high whenever there is violation on the rising edge of SIG_A and SIG_B. Violation on the falling edge of SIG_A and SIG_B is reflected on ERROR_FED.
Fig.10 Pulse Width Failure Checked
Fig.11 Minimum Width Failure Check
Fig.12 Inter-Related Signal Failure Check
Later on the design was synthesized with Xilinx- ISE® and Implementation (Refer Fig. 1) was carried using Xilinx® Place and Route technology targeted for VirtexII-8000 class of FPGA [10]. The implementation was tested on ARM926EJS development board. FPGA utilization for Pulse Width, Sequence Detection and Minimum Pulse Width Wrapper is tabulated (Refer Table 1). (Count refers to the pulse width value for pulse width and minimum pulse width value in wrapper logic. It is number of clock cycle delay between the respective signals in inter related wrapper logic). Utilization increases proportionally with the increases in the debug features.
Table 1: Utilization of the FPGA resource
V. CONCLUSION
We propose a Method and Approach for Efficient and Fast Emulation Debugging. The Architecture proposed consists of DE-IF Block and Wrapper Logic. Wrapper Logic is a piece of parameterized and synthesizable RTL code. The functionality of Wrapper Logic can be enabled or disabled through Debug Enable SFR set of DE-IF Block. The Error Status is updated in Debug Status SFR set. User can read the status and the fault is located in no time. By using the proposed scheme debug time is drastically reduced, thereby increasing the productivity and enhancing the efficiency.
REFERENCES
[1] Raj Mathur,”SoC Prototyping Requiremnts”, in FPGA and Structured ASIC Journal, February, 2004.
[2] Wen-Jon Fang, Peng-Cheng Kao and Allen C-H Wu, “ A Multi-Level FPGA Synthesis Method Supporting HDL Debugging for Emulation Based Designs”, IEEE proceedings.
[3] Xilinx, Constraint Guide.
[4] Kalil Arshak, Essa Jaffer and Christian Ibala, “Testing FPGA based digital system using XILINX ChipScopeTM logic analyzer”, in 29th International Spring Seminar on Electronics Technology, May 2006,pp 355-360.
[5] Samir Palnitkar, “Verilog HDL- A Guide to Digital Design and Synthesis”, Published by Prentice Hall, Upper Saddle River, NJ 07458.
[6] Harry D. Foster , Adam C. Krolnik, “Creating Assertion-Based IP”, Published by Springer Science+Business+Media,LLC,233 Spring Street, New York, NY 10013, USA.
[7] Marc Boule, Jean-Samuel Chenard and Zeljko Zilic, “Assertions Checker in Verification, Silicon Debug and In-Field Diagnosis”, in 8th International Symposium on Quality Electronic Design, March, 2007, pp 613-620.
[8] Yi-Ting Lin, Chien-Chou Wang and Ing-Jer Huang, “AMBA AHB Bus Protocol Checker with Efficient Debugging Mechanism”, in IEEE International Symposium on Circuits and Systems, May, 2008, pp 928-931.
[9] ModelSim Software and User Guide.
[10] Xilinx ISE Software and User Guide
Sample Verilog RTL for Width Check
module WIDTH_CHK(
input Rst ,
input Sig ,
output reg Err_o
);
parameter BIT=7 ;
reg [1:0] PR_ST ;
reg [BIT-1:0] CNT ;
reg [1:0] NXT_ST ;
reg Sig_D, Sig_2D ;
wire Pos_Det, Neg_Det ;
assign Pos_Det = Sig & ~Sig_D ;
assign Neg_Det = ~Sig & Sig_D ; assign Neg_Det = ~Sig & Sig_D ;
always @(posedge Clk or negedge Rst)begin
always @(posedge Clk or negedge Rst)begin
Err_o <= 1'b0 ;
PR_ST <= 2'b00 ;
case(PR_ST)
begin
Err_o <= 1'b0 ;
2'b01: // CNT
begin
Err_o <= 1'b0 ;
2'b10: // CHK
begin
always @(*)begin
begin
2'b01: // CNT
begin
begin
endmodule
Related Articles
- A formal-based approach for efficient RISC-V processor verification
- Pyramid Vector Quantization and Bit Level Sparsity in Weights for Efficient Neural Networks Inference
- An efficient approach to evaluate Dynamic and Static voltage-drop on a multi-million transistor SoC design
- A Step By Step Methodical Approach for Efficient Mixed-Language IP Integration
- PRODUCT HOW-TO: Debugging hardware designs with an FPGA-based emulation tool
New Articles
- Proven solutions for converting a chip specification into RTL and UVM
- Revolutionizing Chip Design with AI-Driven EDA
- Optimizing Automated Test Equipment for Quality and Complexity
- An Introduction to Direct RF Sampling in a World Evolving Towards Chiplets - Part 1
- How to cost-efficiently add Ethernet switching to industrial devices
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- Synthesis Methodology & Netlist Qualification
- I2C Interface Timing Specifications and Constraints
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)
E-mail This Article | Printer-Friendly Page |