By Amit Nene, Texas Instruments
Swaminathan Ramachandran, Texas Instruments
Amongst various pre-silicon solutions, Software Simulation Platforms have always been the preferred environment due to early availability, ease of use, ease of access, lower cost as compared to FPGAs and higher speeds as compared to RTL simulators. We at Texas instruments leverage Pre-silicon software platforms for software benchmarking, architecture analysis, SOC validation as well as software development to meet time-to-market goals. These software platforms are mostly bit accurate, functionally accurate and even timing approximate, in some cases, compared to their hardware counterparts.
These System On Chip platform simulators are developed by integrating C/C++ based IP models that are essentially the software variants of the hardware IP blocks like CPUs, Caches, Interrupt controllers, DMA’s, Hardware Accelerators, External Memory interface and Serial Peripherals. The functional and cycle accuracy of the software platform integration as well as that of individual IP models is critical for usefulness of these software platforms at an early stage of any chip program.
The IP models are usually hand-coded in C/C++ or in SystemC (which is a C++ library), or developed by leveraging ESL tools that generate C/C++ / systemC and further tested using directed tests in a standalone unit test bench. These C/C++ models work at higher abstraction levels and have different interfaces when compared to RTL. Test vectors coming from dedicated Design Verification teams for RTL IP’s are not leveraged by the C/C++ IP models. Test vectors coming from different platforms in which this IP exists are not leveraged for unit testing of C/C++ IP models. Due to the limited coverage, each bug found in the platform simulator reduces the window of opportunity available to the software teams to leverage the pre-silicon solution. Further the time taken to fix a bug also depends on the simulation environment and the availability of applications from customers. Even, after, the SW is fully ready on the simulator, RTL – C model equivalence reports are required to increase the confidence, that the software will run as is during the silicon bring-up phases.
Though ESL bodies & EDA tool vendors are trying to bridge the gaps by defining & leveraging standards for bus infrastructure and modeling like SystemC, SCV & TLM2, the current verification approach for a C/C++ IP model remains weak.
This paper discusses a “Standalone” verification framework that is independent of any specific flow but still allows reuse of Design Verification & Application tests to measure & debug the Cycle Accuracy as well as Functional Accuracy of an IP. As an additional benefit, the component validation speed and turn-around time to fix bugs is improved since the full system simulation environment is not required.
The framework constitutes a standard test bench, a VCD (Value Change Dump) file reader with a signal database, the mirror interface of C++ model and the C++ model (also termed as a DUT or “Design under Test”). This setup is generic and can be used for any C++ IP model. The DUT is connected to the test-bench via bus and signal interfaces. The test-bench uses a VCD Reader to read vcd files and stores necessary information (data and timing of signals/buses) in signal database. This information is used to drive the data/signals into the input interfaces of the C++ model or to compare the expected data with the data/signals received from the output interfaces of the C++ model. A log file is created for each VCD file which contains the pass/fail results for each test step, corresponding to data and the timing of each signal.
Thus, the framework can be used as a setup for Functional verification as well as Cycle accuracy verification of C++ IP models.
Design & Verification flows for RTL and C/C++ models are typically different as the same model needs to be available in different flavors for architecture analysis (before specifications are defined) or software development (before micro-architecture specifications are defined) or for benchmarking (before cycle Accuracy is known).
Though ESL & EDA tools try to bridge this gap by defining a “design by refinement” flow, it does not ensure equivalence of these models. Typically, the RTL developers are not the same as the C++ model developers. SystemC, which is based on C++ is the most popular language to write models. This cannot be used for refinement to RTL easily, as all constructs or semantics are not synthesizable. As performance or speed of these models is one of the most critical reasons for using early software SOC simulators, there is very less code reuse between the C model and the RTL.
Each IP model gets integrated in highly complex SOCs containing multi-core or multi-board simulators. Further, these simulators are modeled using home-grown tools or several 3’rd party tools and this environment is very difficult to replicate & use for IP level debugging, as the purpose of this environment is to debug software. Most of the times, the environment is used by external customers, who refuse to share their applications for fixing bugs in the IP models or IP integrations.
Problems faced by C++ IP model developers include:
- Limitations on the number of unit tests written by the developer to qualify the IP model.
- Limitations on number of applications available for a new platform, as the software developers need the simulator to test their applications.
- No reuse of tests available from Design Verification vectors for the IP and no reuse of tests from previous platforms containing the IP for any unit level testing.
- Platform Level Debugging of say an issue in a Timer IP C++ model needs ,in the extreme case, installation of 3’rd party multi-core/ multi-board simulator.
The scale of the problem becomes larger for different configurations of a C++ IP model used in multiple platforms and the need to run all tests on all simulation platforms for every fix or feature. The IP developer needs to know these platforms as well as environments.
The purpose of this paper is to:
- Introduce a trace based verification methodology for C/C++ IP models.
- Improve the overall quality of the pre-silicon software platform solutions by leveraging test vectors.
- Provide a solution to debug an IP model in absence of a customer application due to their Intellectual Property considerations.
- Provide a mechanism to debug an IP model standalone without the need of any 3rd party environments & multi-core/ multi-board platforms
The objective is to provide a C++ IP model verification methodology, equivalence with RTL and an isolated environment with a single trace based solution.
3. PRIOR WORK
C++ IP models are developed for early pre-silicon activities. The current verification approaches used are not very robust and have lots of gaps. There is no exhaustive test coverage carried out on these models compared to RTL. These models need to be pre-tested before they get used by software developers for development and benchmarking of applications. The IP test coverage and equivalence is essential even for post-silicon usage of simulators
Functional Accuracy was verified by:
- Using a standalone test-bench and write directed unit tests. This is very limited and can only be considered as sanity.
- Using path test cases written for the full platforms like HAL tests to exercise the IP component under test. This does not ensure coverage and cannot be used at unit level.
- Running scenario tests to cover certain functionalities of the IP.
Cycle Accuracy was verified by:
- Comparing the Design cycle numbers with the simulation cycle numbers manually for directed test cases at the platform level. This method is not accurate and very tedious.
- Traffic generators to drive the C/C++ model for various performance scenarios and comparing the results with datasheets from experts. This is an approximation method.
- Using a C/C++ Architecture Model at a different abstraction available from the Design team to compare to the RTL for specific benchmarks. This is not exhaustive method.
The existing solutions are clearly not enough for qualifying these simulators.
4. PROPOSED SOLUTION
The solution consists of a generic test-bench that can read VCD files as test cases and playback the test steps “as is” on the DUT.
- If the tests are derived from RTL flows or used to run pre-existing regressions tests, the pass/fail criteria is as part of the test case and the IP can be debugged for a particular test step that has failed.
- If the tests are run to reproduce a bug in the DUT, the test can be used for debugging the IP model to find the root cause of the issue.
Test cases are reused from various sources as traces:
- Traces from RTL design/integration teams
- Traces from different platforms
- Traces for different IP configurations
- Traces generated from DV setup
Traces are played back on the C++ IP model:
- The trace file is in standard VCD format.
- The trace file has input signal values to be driven to the C++ IP model.
- The trace file has expected golden data for output signals from C++ IP models
Steps used by test bench to run the tests:
- Test bench configures the IP for the platform from which the tests are reused
- Test bench drives the input signals of the C++ IP model by reading the Trace file
- Test bench receives the output signals from the C++ IP model
- Test bench compares with the expected output signal values in the trace file
- Test bench reports pass/ fail results
The test bench is designed to be generic & contains infrastructure components that are common to any IP and interface components that are specific to each IP.
Each interface component can be further converted to an infrastructure component based on the standardization of the interface. The interface components can also be generated from an IPXACT flow.
Description of each block in the framework is explained in the table below:
For each test case, we will use the following flow:-
- Configure the test requirement in the test bench
- Required functional accuracy
- Required cycle accuracy
- Setup the Test bench environment and connect it to the DUT (C++ model).
- Read the vcd file and populate the Signal-db (Signal database). This contains the signal names and initial value of signals.
- Use the Complement-db (Complement data base) to map the vcd signal names with the mirror Interfaces at the DUT boundary, thereby bridging the abstraction gap.
- Find the next signal change in vcd file & update the Signal-db with the new value of the signal and the corresponding clock. This operation is performed in a loop.
- Drive the signal for each DUT IN-signal
- For each DUT OUT-signal
- The test bench saves the signal value(s)
- Functional accuracy validation: The Test bench then checks the value(s) and compares it with the expected value from the VCD (golden) file.
- Cycle accuracy Validation: The Test bench saves the time corresponding to a signal read from VCD file and also the time when the DUT drives the signal (OUT type) into the Test bench. These values, saved in the Signal-db, are used to validate the cycle accuracy.
- For each DUT OUT signal, following exceptions are handled :-
- Timeout: The Test bench will advance the component by a fixed number of cycles to check for any response and if no response is detected during this period, then the Test bench exits with a timeout error.
- Cycle accuracy errors: If a previous response has come in (delta) earlier or later than expected, the test bench will use a (delta compensation) to look for inaccuracy for the next step. This is a greedy approach to find maximum cycle accuracy issues in one run.
- Generate reports
- Functional Accuracy reports
Each test case (VCD) is broken into various test steps, where each test step is a transaction from request to a response on the bus. Each transaction corresponds to signals. For each signal, the response (either logic or data bus) is compared between DUT and RTL. If the response differs, the result for the test step is logged as a FAILED
Otherwise the result for the test step is logged as PASSED
The model is reported to be functionally accurate, if all individual test steps have PASSED. The model developer can debug based on the first test step that has FAILED
Total number of cycles taken by RTL model
r = ∑ Ri, where
Ri = number of cycles taken by RTL in ith transaction.
Deviation of DUT, d = ∑ absolute (Di – Ri), where,
Di = Number of cycles taken by DUT in ith transaction
% Cycle accuracy of the IP model is reported as [100 – d*100 / r]
The model developer can debug based on the first CA deviation that has been reported.
For a particular Communication Infrastructure Chip, we had existing test vectors at the platform level for both the RTL as well as the simulator. The goal was to achieve upto 95% Cycle accuracy for the most critical path constituting a proprietary memory subsystem, interconnect & DDR2. We were facing difficulty in fine-tuning the cycle accuracy of each of the IPs in the critical path.
We used the trace based verification approach separately for each IP by directly using the traces from RTL and fixing functional & cycle accuracy gaps in IPs w/o having to worry how the fix in one IP will affect the cycle accuracy at the platform level.
The unit test bench provides us the true picture of the Cycle Accuracy for a C++ IP model.
We observed [81%] Cycle accuracy at the unit level Vs [91%] Cycle Accuracy for External Memory Interface at the platform level
Again, we found [91%] cycle Accuracy at the unit level Vs [94%] cycle accuracy for the C++ model of the Interconnect when measured at the platform level
There is a risk of being biased with platform level results, as a small tweak in the configuration of the IP models can lead to different results
The time taken to get the results in the IP level test bench is faster compared to running the test on the platform simulator as that involves all other IPs as well.
The IP Functional & Cycle Accuracy gaps can be fixed independently by directly procuring the RTL VCD trace files and can be reported independently as well.
- The framework can be deployed on any C++ model to achieve high level of functional coverage and cycle accuracy thereby improving the quality of pre-silicon software platforms.
- The component can be run in a standalone manner with no dependency on availability of any environment, licenses or tools.
- Equivalence testing is enabled by this approach
- The component validation execution flow is much faster than the older methods.
- The approach is scalable to get to a regression setup for any component.
Currently, CPU blocks cannot be tested with this framework.
The complement database becomes complex as the abstraction differences between the RTL model and the DUT diverge.
8. FUTHER WORK
The framework will be aligned with OSCI SystemC, TLM2 standards
The framework can be enhanced by integrating with standard code coverage, memory leak analysis performance measurement tools
Value Change Dump format ( IEEE 1364-1995)