by R. Deaves and A. Jones, SuperH, Inc.Bristol, United KingdomAbstract :
SuperH, Inc., develops and licenses 32 and 64-bit RISC CPU cores for use in multimedia System-on-Chip (SoC) devices. SuperH has developed an innovative set of tools to enhance the capability of its licensees to validate and verify highly optimized SoC devices in a short timescale. This includes the Modeling and Verification Package (MVP) which allows designers to automatically generate an SoC model. Further, verification is supported through a mixed SystemC/RTL modeling capability.
The models may be enhanced by a comprehensive set of analytical capabilities. These support software and bus optimization.
Crucially, not only do we demonstrate that the use of the MVP can improve the quality of the SoC design but we show how the time to market may be minimized.
This paper highlights the application of the MVP to a design example based on a multimedia ‘Encryption SoC’ which integrates a DES encryption module with the 32-bit SH4-202 CPU core with 16K I and 32K D caches, debug port and peripherals i) Introduction:
This paper describes the application of the Modeling and Verification Package (MVP) provided by SuperH. This package allows SuperH customers to develop their SoC designs with simulators based in SystemC and RTL. This facilitates different verification aspects of SoC design to be addressed.
SuperH is a world leader, with a proven track record in the design [1, 2] and implementation [3, 4] of high performance RISC CPU cores. To ensure that the customer requirements are met SuperH provides a number of design kits. One of these is focused on SoC design  and comprises a number of packages which includes the MVP.
Major manufacturers , EDA vendors  and IP providers  now support the techniques of modeling and verification in the pursuit of optimizing designs, and reducing timescales and cost during SoC development. The MVP enables SuperH licensees to embrace this technology today and will evolve to support future developments. The paper provides results obtained through using the MVP in the Encryption SoC design project.
The remainder of this paper is organized as follows: Section ii provides an overview of the MVP. Section iii gives a description of the rapid modeling capability of the package. The application of SoC software and bus analysis is covered in Section iv. Section v provides results relating to the performance of the verification capability of the package. A discussion on the work presented in the paper is given in Section vi. Conclusions of the manuscript are provided in Section vii.ii) MVP Overview:
The MVP delivers a standalone environment enabling the rapid creation of complete SoC simulation models based on SuperH CPU cores. The package includes a rich set of basic components and configuration tools together with several example SoC simulation models. The MVP enables rapid architectural evaluation of alternative system designs and provides a high-speed simulation platform to allow software development early in the design cycle. In addition a powerful set of profiling tools for design analysis and optimization are provided. The MVP has been designed to work with a range of third party EDA analysis tools protecting users existing tools investment. A critical area for SoC design is verification which the MVP supports through a range of modeling scenarios based around SystemC and RTL technology. Support for all the most widely used HDL simulators is provided including support for the latest tools that enable SystemC and RTL to co-exist in a single simulator.
The MVP integrates with other SuperH packages and kits , design flows of manufacturers  and tools provided by EDA vendors . iii) Rapid Modeling:
The MVP is built around the Automatic System Generation (ASG) tool. This is represented in Figure 1.
Figure 1: ASG Tool Overview
The ASG takes as its input a proprietary system specification file (Figure 2). This text file provides a high abstraction description of the models to be included in the SoC. These models come from two sources:
- A palette of pre-defined models that include timers, interrupt controllers, memory, dma engine etc. The most important of this set is the CPU model, a high fidelity ISS representation of the CPU which includes pipeline, cache and TLB modeling.
- The second model set used by the ASG is introduced by the licensee. Model templates are provided whose functions are populated to represent the licensee’s IP. A test harness is also provided to enable the licensee’s model to be tested before being included in the SoC design.
The ASG tool takes the ssf and produces a SystemC representation of the design linked together with a functional router and generates the corresponding compilation files (make files). Together these automatically generated files can be used to produce an executable simulation of the SoC.
Several examples are provided with the ASG tool. These are of the form given in Figure 2, which represents part of a multimedia design:
Figure 2: System Specification File (ssf) Example
It should be noted that some of the models include a parameter list, for example, the memory model (EMI) includes start and size parameters. The licensee defined models can have up to four parameters that inform the ASG tool:
- Whether the model acts as a target (t), initiator (i) or combined target-initiator (c).
- Its memory mapped start address and size (if a target).
- The absolute or relative directory location of the model.
- Additional ssf directions are provided for dealing with other signals such as interrupts. (A full description of these directions is outside the scope of this paper.)
Running the ASG tool on such ssf representations produces SystemC files that are hundreds of lines in size. The ASG tool allows rapid development through:
- Short-form SoC representation by the ssf that can be written in minutes.
- Inclusion of licensee’s SystemC based IP can be carried out methodically and quickly.
- Modification of the SoC in seconds.
It should be noted that a GUI based tool for SoC modeling that uses bus cycle accurate (BCA) and Transaction Level Modeling (TLM) routers will be available in the near future.iv) Analysis:
This section of the report highlights the software and bus analysis capability of the MVP and the integration capability of the models to useboth open source and commercial third party products is highlighted.
Figure 3: SoC Design Project Model
Figure 3 represents part of the ‘Encryption SoC’ based on the SH4-202 CPU, an external memory interface (EMI), keyboard controller and DES encryption engine. For the purpose of this example the Open SystemC Initiative (OSCI) TLM router is used to link the models together. Software and Bus Analysis capability is also included in the model.
Software Analysis: In order to highlight the application of software analysis to the design we consider two processes that need to be completed. The first is concerned with testing the keyboard control model to determine if a key has been pressed. The second process is concerned with encrypting a section of memory. Here we examine the effect of controlling these processes serially and in parallel. The C code that runs on the SH4-202 CPU for these processes is listed in Figure 4.
Using the executables for this code and the value of the program counter software analysis traces can be produced.
Figure 4: Serial and Parallel Control Schemes Figure 5: Series/Parallel Control Results
Results that compare the performance of the serial and parallel processing schemes are represented in Figure 5. Here the y-axis represents the C function and the x-axis the number of CPU clock cycles taken to complete the function. The results indicate that the parallel control scheme completes in less clock cycles than the serial control scheme. Further, the number of clock cycles for these schemes can be determined, i.e. 21.5kcycles for series control and 16.5kcycles for parallel control. It should be noted that a more detailed analysis of the behavior of the CPU can also be obtained. This can be used to relate the CPU instructions to the occupancy of the CPU pipeline and is important in verifying that the design meets its specification and for exploring possible optimizations. A representation of this trace is given in Figure 6, but is not discussed in this paper.
A very important aspect of the MVP is its ability to integrate with third party tools. This allows the licensee (and the MVP) to capitalize on all the benefits of using a commercial analysis tool. The CoWare ConvergenSC analysis capability has been integrated with this design example. This is used to generate the Series/Parallel Control Results in Figure 7 (Series: top plot, Parallel: lower plot).
Figure 6: CPU Instruction Trace Example
Figure 7: ConvergenSC Display
Bus Analysis: CoWare has equipped the OSCI TLM bus with their ConvergenSC Bus Analysis capability. This can provide a variety of analysis results relating to the latency, throughput and contention on the router. A trace of the router transactions is provided in Figure 8 (top plot). This indicates that there is an architectural opportunity to reduce the number of clock cycles to complete the Encrypt process. This can be achieved by removing the delay between storing the encrypted data and loading the next data for encryption. The effect of capitalizing on this improvement opportunity is represented in Figure 8 (lower plot).
This change has the effect of reducing the number of clock cycles required to complete the Parallel Control scheme, by approximately 5.5kcycles. These results are represented in Figure 9 (Pre-architectural change: top plot, post-architectural change: bottom plot). Figure 8: Bus Analysis Figure 9: Updated Software Analysis Resultsv) Verification:
The verification support within the MVP is focused on providing an environment that allows the integration of SystemC and RTL models. This enables the verification of the RTL portion of the design while taking advantage of the high simulation speed associated with the SystemC models.
An important aspect of this mixed modeling capability (SystemC-RTL) is the simulation speed achieved. In this paper we provide empirical results obtained for a number of different mixed modeling configurations.
The first of these configurations is based on representing the whole model in RTL. In the second configuration an IPC socket is used to connect the SystemC simulation to the Programming Language Interface (PLI) of a HDL simulator. (For the work presented in this paper the HDL employed is ModelSim by Mentor Graphics, however, Cadance’s NCSim and Synopsys’ VCS are also supported). This configuration is represented in Figure 10.
Recent advances in HDL simulators provide the opportunity to include the SystemC and RTL models in a single HDL simulator without the need to run (and synchronize) two separate processes. Mentor Graphics have such capability in the Beta release of version 5.8 of ModelSim. This provides a more tightly coupled integration for SoC analysis. This scheme is represented in Figure 11.
Results are also presented for running the whole simulation in SystemC.
The final plot represents the performance of the standalone CPU model. This includes modeling of the ISS, floating point unit, TLB and caches. Further, the CPU model has been compiled for simulation speed optimization. However, it does not include debugger interface, POSIX console or licensee IP integration or any other peripherals capability.
Figure 10: SystemC/RTL Integration with PLI
Figure 11: Advanced SystemC/RTL Integration
The simulation speed results for the different SystemC/RTL simulation configurations are provided in Figure 12. These were obtained running the simulations of a Sun-Fire-280R @ 750MHz machine with Solaris 5.8. It should be noted that no technical effort was placed on maximizing the results presented in this paper. For the mixed SystemC/RTL modeling a representation of the Encrypt module was placed as RTL verilog in the HDL simulator. The remainder of the design was placed in the SystemC domain.
Figure 12: Real-Time Simulation Cycles/ Second
In Figure 12 the x axis represents the modeling scheme employed. The y axis represents the real-time clock cycles achieved per second. For clarity this axis is presented as a logarithmic scale.
The pure RTL simulation provides the highest level of design fidelity for the schemes presented in this paper. This allows RTL level verification of the design. However, the penalty for this high fidelity is that the real-time simulation speed is low and verification timescales are long.
The standalone CPU model provides the highest real-time performance but requires additional interfaces for SoC design and evaluation. Equipping the raw CPU model with the interfaces required for SoC design and development reduces the real-time clock performance. However, this model is useful for SoC software development (especially when taking advantage of the 5X speed-up provided on Linux based machines).
The MVP aims to exploit the benefits of both these pure modeling schemes by employing a mixed SystemC/RTL capability. Here the simulation speed of the SystemC is coupled with the high fidelity of the RTL. This allows the design engineer to place the sub-systems of interest in the RTL domain for verification while using SystemC to represent the remainder of the design.
Historically, the MVP has provided a socket and PLI interface to enable this mixed modeling capability. However, recent developments by EDA vendors enable SystemC and RTL to co-exist in a single simulator. This provides a number of benefits, including:
- As seen in the results of this paper the simulation speed can be increased by factors of at least 5. This is mainly achieved through removing the slow socket and PLI interfaces.
- Having the SystemC and RTL in the same simulator using a common IDE makes development easier.
In general, the majority of changes in an SoC architecture take place early in the project while relatively high level abstraction models are being used. The rapid modeling aspect of the MVP enables design alternatives to be quickly assessed for both the final design or sub-sets of the design.
The software analysis results provides an example of how the number of clock cycles required for the control tasks could be reduced through considering a trade-off of two sequences of software. It should be noted that both sequences are functionally correct. Further, it could be argued that the series scheme (with the larger number of clock cycles) is the more intuitive. The gain is achieved through exploiting concurrency. This is dependent on a number of issues relating to the independence of the concurrent processes. This suggests that identifying such opportunities would be difficult to highlight without the use of modeling and analysis tools. Similarly, in the case of the bus analysis it would be difficult to detect the architectural opportunity to improve performance without having the router equipped with the data gathering and display capability. Further, bus and software analysis are not independent and this makes optimizing the design difficult, especially if other issues such as power and size constraints are introduced. These factors motivate the use of analysis tools to ensure the design meets its specification without excessively over-engineering the design and within short timescales.
The results achieved for the number of real-time simulation clock cycles achieved per second are comparable to those presented in other public domain documents . These results are useful in a number of ways including using the empirical figures to allow code run-times to be estimated. This is an important factor for project management allowing trade-off decisions relating to EDA licenses, computing hardware and timescales to be made. For example the project manager may decide to run the final verification suite on two computers (requiring two EDA) licenses. This would reduce the time to carry out the verification but would incur additional license and hardware costs.
The work documented thus far has recently been developed including using the SystemC model for developing OS (Linux) device drivers for the encryption algorithm hardware (i.e. the Data Encryption System, DES ). This work culminated in the demonstration of the DES module executing application code running under the Linux OS.
Figure 13: Model-to-Hardware Partitioning
In addition to the MVP, a prototype system was developed in hardwareusing the MicroDev modular development board package from SuperH (Figure 13). This includes a CPU board that houses an SH4-202 evaluation chip (including EMI, Serial Communications Interface FIFO (SCIF), and Router) and an FPGA board. In this case the design was partitioned such that the DES and DMAC (and associated interconnect) were placed on the FPGA board to represent the Encrypt IP and the remainder of the design placed on the CPU board. This provides a simple mapping to the SoC model of Figure 3. This work is outside the scope of this paper and will be fully documented in future papers.
The success of applying the MVP in this project was demonstrated by porting the DES subsystem on to the hardware prototyping boards within a few hours. This task comprised taking the HDL RTL and using the Quartus tool (by Altera) to represent this DES subsystem on the FPGA. The SH4-202 software needed no changes for this demonstration. vi) Conclusions:
The paper has described three individual aspects of the MVP; namely rapid modeling, analysis and verification.
Rapid modeling is achieved through the use of the Automatic System Generator (ASG) tool that takes a high abstraction textual representation of the SoC and generates the corresponding SystemC and compilation files. These automatically generated files can then be used to generate an executable of the SoC model. Rapid development is supported since SoC models can be set-up in minutes, licensee IP (SystemC) models integrated quickly, and SoC modifications made in seconds. As an example, the textual description of part of the design of a multi-media SoC was provided.
Two aspects of analysis were considered: software and bus analysis. The software analysis was used to investigate the performance of two control schemes applied to part of the multimedia design. The results demonstrated which scheme was more appropriate and provided a quantitative measure of the benefit, i.e. approximately 25%. The bus analysis highlighted that an architectural opportunity for improvement existed in the design. Capitalizing on the design change showed that a reduction in the number of clock cycles required to complete the control task was achieved, i.e. by more than 30%. In addition, the compatibility of the MVP to tools provided by third parties was highlighted. This included using the free OSCI TLM and the commercial analysis tools provided by CoWare (ConvergenSC).
Verification in the MVP is focused on providing a mixed SystemC – RTL modeling capability. This allows verification of the RTL portion of the SoC while capitalizing on the simulation speed of SystemC. Empirical results indicate that using the MVP with a socket and PLI interface reduces the simulation time by at least a factor of 6 when compared with pure RTL. Further, using the advanced HDL simulator where the SystemC and RTL models co-exist, the run-time can be reduced by a factor 24 when compared with RTL.
Discussion of the results show how the three MVP aspects considered in the paper can contribute to minimizing the time to market of the SoC design. In addition, the MVP can be used to contribute to trade-offs of technical issues (for example, ensuring that the design meets the specification without excessive over-engineering) and project management issues (for example, how many HDL licenses and computers to employ against verification time). The paper also highlighted how the MVP linked with other SuperH packages to provide a hardware prototype of the SoC. Acknowledgements:
The authors would like to thank their colleagues at SuperH for valuable contributions to this work, especially Mark Jones, Mark Hill, Alan Alexander, Matt Fyles, Andy Bond and Stuart Ryan. Acknowledgements are also made to Matthew Elliott, Dave Upton and David Machin of CoWare, and Nigel Elliot and Gabriel Chidolue of Mentor Graphics. The authors are grateful to Benoit Clement and Bernard Caubet of STMicroelectronics for valuable discussions on applications of this technology area. References:
 A. Hasegawa, M. Debbage, A. Sturges et al, “SH-5: The 64-bit SuperH Architecture”, IEEE Micro, July/August 2000.
 R. Curnow, M. Hill and A Jones, “An Embedded Processor Architecture with Extensive Support for SoC Debug”, IP Based SoC Design’2002, Paper Id 53, October 2002.
, “SuperH Inc Homepage”.
 http://www.hitachisemiconductor.com/sic/jsp/ japan/eng/products/mpumcu/32bit/superh/sh7751_e.html
, “SuperH Family – Processor Type”.
 R. Deaves and A. Jones, “An IP-based SoC Design Kit for Rapid Time-to-Market”, IP Based SoC Design’2002, Paper Id 52, October 2002.
 B. Clement et al, “Fast Prototyping: a system design flow applied to a complex SoC multiprocessor design”, DAC 99, New Orleans.
 R. Klein, “Hardware/Software Co-Verification”, Embedded Systems Conference, Class #361, 2003.
 Press Release, “CoWare and ARM Speed System Level SoC Designs”, http://www.arm.com/news.nsf/html/CoWare1208
 “CoWare Homepage and ConvergenSC Details”, http://www.coware.com
, “DES Standard”.
 B. Bailey, “Scalable Verification: A Comprehensive, Flexible, Methodology for Complex, Multimillion Gate Designs”, Functional White Paper, Mentor Graphics Corporation 2003.