Update: ARM to Offer Cycle-Accurate Virtual Prototyping for Complex SoCs Through an Asset Acquisition from Carbon Design Systems (October 20, 2015)
by Bill Neifert, Carbon Design Systems
Boston, MA, United States
IP-based SOC designs are increasingly dependent upon ESL methodologies to balance the constantly increasing pressure on both schedule and design complexity. In order to meet these demands, system models need to be generated rapidly and delivered to the teams developing the architecture, coding the implementation and validating the software. This paper will discuss a method for accelerating the development and validation of these complex system models.
It has long been obvious that the design methodologies and practices that were used to build successful ASIC programs were going to be insufficient for the next generation of complex system and system on chip designs. This constantly increasing complexity coupled with tight time-to-market demands have dictated a new methodology.
The electronics industry has responded to these needs by developing a myriad of technologies and methodologies which are lumped together under the singular title of Electronic System Level or ESL. Although exact definitions of what ESL actually means are varied, there is fairly wide acceptance that it involves describing the behavior of a system at a higher level of abstraction and then using that description to define the architecture of the system and enable early development of software. The path from this abstract system description to an implementable model is typically either achieved by behavioral synthesis or manual refinement.
This paper will describe a method which will accelerate the rate at which system and processor IP models can be developed and validated. An additional benefit of this method will be the addition of implementation-level accuracy to the system description. This accuracy can be used to better understand the ramifications of architectural tradeoffs and also to enable the early development of firmware and driver software.
2. IP Models
System on chip (SoC) designs are typically comprised of a number of functional blocks connected to one or more central buses. One or more of these models is typically either a processor or DSP block. The remaining blocks generally implement various interfaces and functions which define the overall SoC behavior. In order to model the behavior of these blocks, it is typically necessary to hand-code a functional description. This coding is normally done either directly from a specification, derived from part of a larger algorithm or rewritten from a previously implemented block. The complexity of the block, the level of abstraction at which the functionality is modeled and the skill of the coder all help determine how long the coding process takes and also the accuracy of the model which is produced.
Once the IP block is coded it can be used as part of the overall ESL model to make architectural tradeoffs and as part of a platform for early software development. The value which can be derived from this model however is highly dependent upon how early in the design cycle it can be delivered and how closely it mimics the behavior of the block it is intended to model. A simplistic model of a complex block may be coded quickly but if it does not accurately model the intended behavior is this model truly beneficial? If the engineer spends the time to accurately model the behavior of the block it may be finished too late to add value for architectural exploration. A poorly modeled IP block can lead to inaccurate architectural decisions, bug-filled software and possibly even chip respins. In the tradeoff between model development time and model accuracy it is often difficult to strike the correct balance and many times the engineer finds out too late that the tradeoffs were incorrect. There is a strong need to adopt a methodology which gives the engineering team access to accurate ESL models much earlier in the design cycle.
3. Accelerated IP Model Generation
The exact number differs from design to design but the widespread consensus is that 80% of the typical SoC is either reused from a previous generation or purchased externally. While the behavior of this IP may be well understood in either its previous environment or a standalone simulation it is still necessary to incorporate an accurate model for the behavior of this reused IP into any new design which is created. In order to correctly model the overall behavior of the entire SoC it is generally necessary to have models for all of the active IP. Although software development can start with a simple processor model, accurate architectural decisions and final software development can only be done when all of the IP is present and correctly modeled.
Fig 1. Reused and external IP typically consume the majority of most SoCs
Reused and external IP can often prove to be the most difficult to model in a system environment. The reused IP block is often poorly documented or dependent upon designers which have since left the company. External IP is typically more difficult to model since its functionality is may be documented only from an integration and high-level perspective and not with sufficient detail to accurately model all of its actual behavior. The designer tasked with modeling these legacy and external blocks is typically stuck becoming a human translator laboriously poring over thousands of lines of RTL code to generate a usable system-level model. Once this model is generated the pain persists as this model must be constantly validated against the behavior of the implementation model and this equivalence must be proven again any time either model is changed.
Carbon Design Systems has developed technology which addresses the model generation problem. The VSP product contains a compiler which takes Verilog and VHDL RTL as its input and generates a high speed software object as its output. This model is coupled with a versatile, high-speed API which allows for easy integration into diverse system platforms. Industry standard platforms such as SystemC are further supported by the automatic generation of wrappers which map this API directly into standard SystemC functions.
4. Model Compilation
IP models which are directly compiled from the RTL have a significant number of advantages over hand-coded models. Model development time is significantly shorter than hand-coding. The typical model can be compiled and validated in days whereas a hand-generated model generally requires multiple man-months. Compiled models are generated directly from the RTL and therefore model 100% of the logical functionality of the block as it will be implemented in silicon. The best hand-generated model will still fall well short of this metric and require substantially more development effort. This accuracy is necessary to correctly debug and diagnose system integration problems. The software designer or architect can confidently isolate problems since the behavior of the model will directly correlate to the behavior of the implemented silicon.
In order to simplify the inclusion of automatically generated models into ESL environments, an API is generated which provides visibility to all of the contents. This API allows for the ESL environment to examine or modify any signal in the model. It also allows for easy debugging of hardware issues and integrated hardware/software profiling. In addition the API allows callback functions to be attached to any signal as well to facilitate hardware breakpoints.
The inclusion of RTL into the system design space is not a novel idea. Cosimulation options exist in all of the major RTL simulators. Most of these simulators can even run SystemC models as part of their executable. Cosimulation rarely meets the needs of the architect and software designer however. Simulation tools do an excellent job of representing the behavior of the RTL portion of the system but miss out on the profiling, system analysis and code debug tools which exist in most ESL solutions. In addition, the overall execution speed of a cosimulation approach is typically far too slow to satisfy the needs of all but the most patient designer.
5. Accelerated Execution
The IP models compiled by Carbon’s VSP tool overcome the speed limitations of traditional cosimulation using a variety of methods. The models themselves execute substantially faster than the same code executing in a traditional simulator. In addition, the high speed API is designed to allow one or more models to be integrated directly into system environments. This API allows the models to run much faster than would ever be possible with traditional simulation and makes integration much more straightforward.
Once the compiled IP model has been integrated into a system environment there are certain techniques which can be applied to further speed the execution of the entire system. The first technique employed by Carbon to speed the execution of IP models is called Replay.
Replay technology is modeled around the methods which software engineers use to debug their executables. While hardware engineers tend to debug using a monolithic waveform dump of all the signals in a block, software engineers tend to iterate the execution of the program multiple times to isolate the problem.
Fig 2. First run saves all pin I/O and checkpoints for RTL-based models
Replay technology executes the behavior of the IP block on only the first iteration. During this run the inputs and outputs of the model will be saved along with any registers being observed. Periodic checkpoints of the entire model will also be saved.
Fig 3. Subsequent iterations replace RTL model functionality with saved I/O from first run
Subsequent iterations of the system will use these stored pin and register transitions instead of the re-executing the model behavior. If the inputs mismatch at any time the replay of the saved data will be stopped and a saved checkpoint will be loaded into the IP model. The compiled model will then be used for subsequent cycles. Since it is unlikely that the mismatch will occur at the exact time a checkpoint was saved, the model will be brought to the correct timestamp in the background by playing the input vectors against the model to sync it correctly with the rest of the system. Once the model is in sync with the rest of the system, normal execution is resumed. Replay technology allows iterations of the system to run substantially faster and enables software problems to be isolated much more quickly.
6. Transaction Models
IP models which are compiled from RTL maintain all of the functionality of the original source including its cycle accuracy. This level of accuracy is ideal for correctly modeling the model’s behavior but it can also make integration into transaction-based system models more difficult. In order to facilitate easier integration, a library of developed transactors can be used. Using this library of existing protocols it is simple to quickly integrate RTL models into system models written at varying levels of abstraction. These transactors can also be used as a source for additional performance optimizations.
Carbon’s On-Demand transactor technology takes advantage of the typical behavior of most SoC designs. In most systems the majority of the clock cycles are spent executing code in the embedded processor(s) and the other blocks in the design are dormant. The On-Demand technology executes the RTL-based blocks in the system only when they are the target of a transaction or generating a transaction. Using this approach, the overall system throughput can be substantially faster. In the above graph, the system with no RTL-based models runs at 10 MIPS. The RTL blocks total approximately 250 Kgates. The graph demonstrates the throughput of the system vs. the percentage of cycles which are executed solely in the processor. Since most SoCs spend >90% of the time executing code it is possible to achieve substantial system throughput even with a large percentage of the system automatically generated from RTL.
7. Model Validation
Once the IP model has been generated, it is important to validate that the behavior of the model matches with both the specification and the implementation. This is a traditional weakness for hand-generated models since there is no link between the model and the actual implementation. Even though generated models are compiled directly from the implementation model it is still often desirous to validate the behavior of this model to prove its equivalence. There are a variety of methods by which this can be achieved.
The most straightforward method to prove equivalence is to link the compiled IP model back into the RTL verification environment. This way the same stimulus which is used to verify the behavior of the original RTL is used against the model. Carbon’s VSP compiler helps automate this process by automatically generating Verilog or VHDL wrapper code which matches the pin definition of the RTL. This code can be used directly in place of the source RTL in the verification environment. The wrapper code directly instantiates the IP model allowing it to easily be driven by the original RTL verification environment.
The methodology discussed in this paper has been adopted by a number of companies in various technology areas. Unfortunately, non-disclosure limitations prevent the discussion of exact design details but the time consumed by the previous hand-written approach and the automatic path using VSP compiler is shown in the graph for a small but typical sampling of customers.
Fig. 5 Hand-generated model development times vs. automatically generated
In every case, the amount of time saved for the automatic path is quite substantial. In each of the above examples the actual compilation of RTL into the IP model took several minutes. The time shown on the graph includes this compilation time and also the time the customer took to then validate the design in their RTL or system environment.
Of course, rapid model development is only a part of the value that comes from automatically generating an IP model. In the small sampling of designs above one user was able to find software issues related to pipeline stalls which had been impossible to find using an FPGA prototype. Another user was able to distribute accurate models to the end-customer well in advance of silicon to enable early integration. Yet another user was able to place system models in the hands of the entire firmware development team to complete system integration before silicon was back from fabrication.
The methodology discussed in this paper can be used to accelerate the development of IP models. This methodology has already delivered success at a large number of customers and can deliver an implementation-accurate platform well in advance of actual silicon. This platform can add significant value to the architect for making hardware/software tradeoffs. It can also be used to generate a development platform for the entire firmware development team.