by Remi Francard — ST MicroelectronicsMick Posner — SynopsysAbstract
This white paper examines the verification methodology and tools used in ST Microelectronics’ GreenSIDE project. The paper focuses on the central memory architecture, the GreenSIDE Main Memory (GMM), and its standard AMBA interfaces. The GMM plays a key role in the overall design’s verification, because all the main system-on-chip (SoC) components interact with this central main memory. Fundamentally, the verification of critical paths starts and ends within the GMM block. This white paper explains how the ST team used Synopsys’ DesignWare®‚ memory models, the DesignWare C-based memory core (memcore), and the Synopsys MemScope graphical memory debugging tool to verify the GMM in both standalone and SoC simulation environments.
Design Methodology Constraints
When selecting the verification methodology and tools to verify the GreenSIDE design, the following constraints were considered:
- Must be cost effective
- Must minimize risk of silicon re-spins
- Reduce overall time to market
- Improve overall quality of verification
- Enable reuse between projects
- Offer fast integration of external intellectual property (IP)
- Track down design defects and ensure they are resolved
- Secure the features of the design and system operability
- Guarantee the design maturity
- Employ a fully self-checking regression environment
To achieve the stated goals, a layered approach to verification is required. The layered approach is applied to individual block-level verification, as well as at the subsystem and full system level. Each layer of tests builds on top of the other so moving from layer to layer requires minimal effort. Each layer of tests is portable so it can be reused at a subsequent layer or within a new verification project. In this verification methodology, there are three layers. Layer 1 tests target interface protocol verification. Layers 2 and 3 target application-specific logic verification using realistic data traffic generation.
Figure 1. Verification Layers
The goal of layer 1 is to test the physical bus interface and ensure that it does not violate bus protocols. The interface must adhere to the defined protocol. The layer 1 tests are a set of directed tests that ensure that all of the buses’ different cycles can be correctly executed. Individual tests are created to exercise specific areas of the buses’ protocol. Once all basic transactions have been covered, layer 2 tests can begin.
The goal of layer 2 is to generate transaction sequence tests that not only stress the bus interface logic but also target the application-specific logic. Layer 2 tests are structured to generate realistic design traffic. In the GreenSIDE project, layer 2 tests were considered to be an application service of a set of services. For example, a service could be the task of transferring data from external memory into internal memory. An example of a set of services is a configure, transfer, retrieve test. This would configure a co-processor, transfer data, then retrieve the data. Each stage is represented as a standalone service. To fully achieve the layer 2 goals, constrained random techniques must be applied to the verification environment. The main benefit of using a constrained random approach to achieve the layer 2 goals is that it’s easy to achieve the first tests. With a couple of simple bus functional commands, bus cycles are generated. High bus cycle and functional coverage are achieved very quickly and more corner cases will be found. The coverage statistics will be far more complete than what could be achieved using directed tests only. This constrained random environment is able to generate huge amounts of stimuli from a minimum of testbench code. As it is constrained to the design requirements, simulation cycles are not wasted by inadvertently activating unnecessary sections of the subsystem. The constrained random traffic will stress the design block under verification far more than directed tests can. The real representation of the traffic also thoroughly tests the blocks’ application-specific logic in a manner much closer to how the physical silicon will act.
Tests at layer 3 are used to raise the overall confidence in the design’s stability. Full sign-off scenarios are run, which include system and application boot sequences. The software-to-software interfaces can be checked at this level. The final application programming interfaces (APIs) and drivers are tested and a more focused performance trial can be executed. This enables a full context validation of the design. An example of layer 3 tests is simple OS interaction, which creates realistic traffic and noise on the bus with tasks jumping in and interrupting the running services. The layer 3 tests target the higher-level functions of either the individual block or system. Testing at layer 3 typically uncovers bugs that traditionally have been uncovered only when the final product is already available in the lab.
The GreenSIDE Main Memory Architecture Overview
The GreenSIDE Main Memory, or GMM, is a 12-Mbit SRAM memory block. Its internal architecture was designed to optimise shared accesses and avoid conflicts between internal memory buses. It was designed to minimize DSP data accesses latency, and to maximize the overall system data processing speed.
The GMM includes five access ports: two AHB ports, two ST propriety RG/RG ports, and one APB port. The two AHB ports are used to interface the memory with a subset of the AHB protocol. The two RG/RG ports are designed to interface with two ST DSPs. The APB port is devoted to the overall memory configuration.
Figure 2. Block Level View of the GMM
The following figure shows a break down of the GMM architecture.
Figure 3. Break Down of the GMM Architecture
The Memory Core
The memory core architecture was designed to improve the sharing of the memory and reduce the chance of possible contentions. To minimize the latency when reading, memory accesses are 128-bit oriented. The central memory has a zero wait state interface.
AHB burst writes are assembled in packets of four words (128 bits) before being transferred to the GMM in one cycle. For an AHB burst read, 128 bits are read in one cycle from the GMM before being split into four AHB words. AHB single transfers take one GMM cycle per word. The RG/RG interfaces are 128 bits wide. A write access is immediately transferred to memory in one GMM cycle. For a read transfer, 128 bits are read in one GMM cycle and transferred to the RG/RG bus.
The Memory Core is composed of 24 sub-elements (or cuts) of 512 Kbits (16K x 32) each. These 24 cuts are arranged in two banks of six Mbits (12 cuts per bank). The individual bank cuts are arranged in an interleaved memory architecture, thus each cut represents one of the following set of lower address ranges: 0,4,8,c–1,5,9,d–2,6,a,e–3,7,b,f.
The following figure shows the memory organization.
Figure 4. Memory Organizaiton
BANK1 and BANK2 can be accessed simultaneously. A memory line is composed of four 32-bit words (word3, word2, word1 and word0) meaning that 128 bits can be written to memory or read from memory in one cycle. Each cut is individually accessible. Byte access is supported for each cut, using the bit write option.
Using the Custom DesignWare Memory Model Interface to Gain Debug Visibility into the GMM
One of the initial problems encountered with the simulations of the GMM was that traditional HDL simulation models of the embedded SRAM blocks did not provide any debug visibility. These simple HDL models did not support pre-loading or dumping of the memories’ data during simulation, which was required to create a self-checking simulation environment. This critical feature is necessary to jump start simulations without lengthy wasted simulation cycles where the embedded core loads the program code or data image from external memory into the GMM. Dumping the contents after simulation and back-door accesses that allowed the verification engineer to read and write without advancing simulation time were also required.
The GreenSIDE team’s solution was to use the Synopsys DesignWare memcore. The existing HDL embedded memory models were converted to use this C-based memory core. This simple inclusion provided support for the advanced testbench-to-memory debug capabilities that were required.
In simplistic terms, memcore replaces both the HDL memory array declarations and the reading and writing from these arrays. The models’ function is kept in the HDL, and all error, warnings, and notes messages are still generated from the HDL. The only real change is that the data for a read or write is stored in the C-based memcore rather than in an array in HDL code.
The GreenSIDE team chose this solution based on the ease in which the verified HDL model could be modified to include the memcore. The HDL models were fully verified by ST Central R&D, and because the new enhanced models function code had not been touched, the models were treated as still fully verified by the GreenSIDE team. The following examples show how this was done.
Replacing HDL Memory Arrays with the Synopsys Memcore Example: HDL declarations for a 4M X 32-bit memory
type mem_array is array (4095 downto 0) of std_ulogic_vector(31 downto 0);
signal mem : mem_array;
Synopsys (replaces memory array declarations)
Where 12 defines the addressable range, 32 defines the data width, model_id is a unique user-defined identifier for the memcore, and the instance_handle is the returned unique identifier used in subsequent memcore accesses.
Replacing HDL Memory Array Read and Writes with the Memcore Functions Example: Read from memory
data = mem[address];
data <= mem(address);
Synopsys (replaces read from array)
memcore_read_ext(inst_handle, address, data, A_read_cycle);
Where inst_handle is the unique identifier returned by the memcore_instance_ext command, address is the location being read from, data is the returned data value, and A_read_cycle is a user-defined cycle type that is captured for memory debug purposes.
Example: Write to memory
mem[address] = data;
mem(address) <= data;
Synopsys (replaces write to array)
memcore_write_ext(inst_handle, address, data, A_write_cycle);
Where inst_handle is the unique identifier returned by the memcore_instance_ext command, address is the location being written to, data is the data being written, and A_write_cycle is a user-defined cycle type that is captured for memory debug purposes.
Once the above changes were made, the new memcore enabled models were re-validated against the original HDL models testbenches. This quick test confirmed that the models still simulated exactly like the original HDL models, including all data storage and messaging. The task to convert the models took less than one day to complete.
Applying the Layered Verification Approach to the GMM Block
As the GMM was a key component in the overall SoC, it had to be thoroughly verified in a standalone environment before the system could be constructed. To do this, the individual block was verified in a standalone environment and tests were generated at each of the layers. Each layer’s tests stimulated an increasing amount of the GMM’s functionality. This captured fundamental functional errors before subsystem integration began.
At the block-level verification stage, the layer 1 tests check that the block can be accessed via the defined AMBA interface or the ST propriety RG/RG bus. The example applies the verification process to the AMBA AHB bus, but the same methodology was applied to the ST RG/RG bus. The block must conform to the AMBA protocol before subsystem integration can start. All bus cycles must be checked to ensure that the block will function when connected to other blocks in the AMBA system. The best way to achieve this in the shortest time and with maximum coverage is to drive the user logic block with verification IP.
Figure 5. AMBA Master and Monitor Used During Layer 1 Tests
In this example, the AHB master verification IP is used to generate the directed read and write tests that will fully exercise the AHB bus interface on the user-defined block. A monitor is used to check that none of the AMBA protocol is violated, as well as to capture bus cycle coverage information. Monitoring the coverage lets the verification engineer know how much of the bus interface has been tested. A goal of 100 percent bus coverage at layer 1 using directed tests may be impossible to achieve. A more realistic target at layer 1 should be 100 percent bus cycle or transaction coverage. This ensures that the block can respond to most of the AMBA cycles and is achievable in a minimum amount of time.
To achieve higher functional coverage, the layer 1 test environment can be quickly expanded to support the layer 2 tests. At layer 2, more realistic bus traffic data is required to test the block’s functionality. To fully achieve the layer 2 goals, constrained random techniques must be applied. Creating random traffic at layer 2 is very important because it stresses the bus and reveals corner cases that may have not been considered. Generating directed tests to do this would not only take a great amount of time, but would not mimic a real AMBA environment. These sequences of transactions begin to test the application-specific functionality of the user-defined block.
Heavy AMBA accesses can be performed and block-to-block data integrity can be checked to test the user-defined block in a more complete fashion. Sequences of transactions should be generated to check the user-defined blocks’ functionality in the context of a more complete system.
The goal of the layer 1 tests, conformance to the AMBA protocol, is still valid at layer 2. The layer 2 tests will generate a more complete coverage of the AMBA AHB protocol. The constrained random stimuli will produce tests that get closer to the 100 percent bus transaction coverage goal.
Layer 3 application-specific tests can be generated using this same setup. Tests can be created that mimic the real application’s functionality like cache access, DMA transfers, and configuration. The only limiting factor to running boot and application configuration tests are simulation cycles and wall clock time itself. Tests executed at layer 3, using a traditional full functional CPU model, would require millions of clock cycles to complete. Using the AHB master verification IP reduces the required clock cycles and makes running the layer 3 block tests manageable. The overall advantage of the layer 3 tests is that they stress the block in a fashion that is closer to its final applications.
Example of Constrained Random Stimuli Applied to Simple AMBA AHB Cycles
Constrained Random Transactions (CRT) functionality enables a spread of cycles to be generated in a simple and quick fashion. For a given AHB transaction, the transfer type, address, size, and burst type can be constrained.
Figure 6. Example of Constrained Random Transaction Generation
For example, an AHB master could generate a mix of read and write cycles of varying sizes and burst widths. Combinations can quickly be constructed to generate read transfers 10 percent of the time, write transfers 70 percent of the time, and idle states the remaining 20 percent. Coding this in a traditional HDL is very difficult.
A fully constrained random environment is defined as a set of transactions with a layer of sequences above that, then a layer of choices sitting above that, with the final layer being the transaction constraints. The payload is fed into the system, which creates an autonomous stimuli generator. Individual transactions are joined together to create a sequence. Sets of sequences are joined together to create a choice. Sets of choices will produce a wide variety of transaction cycles and responses.
Integrating the GMM block into the SoC subsystem
The layered verification approach was applied to each block before it was integrated into the overall subsystem. Each level of the layered verification approach was applied to the subsystem environment in the same way that it was for block-level verification. The goals of each layer are the same, i.e., for layer 1, protocol checking, for layer 2, transaction sequence generation, and for layer 3, application-specific tests. Incorporating each block into a subsystem tests not only the individual block, but also the subsystem construction itself.
The layer 1 tests can be re-used to quickly check the subsystem connections. The block-level constrained random transaction layer 2 tests can also be reused. The same AHB master constrained random tasks can be executed to generate tests within the subsystem. Very quickly, this subsystem verification environment will generate huge amounts of design data that will thoroughly test both the interfaces of each block and the overall application.
Coverage at this level is important. As the overall design under verification gets larger, generating tests to achieve the coverage goals gets increasingly more difficult. At the subsystem level, it is almost impossible to code directed tests that will stimulate all possible configurations of even the simplest GMM configuration on the AMBA AHB subsystems. Constrained random transaction generation is the only way that users can generate enough subsystem tests to achieve coverage goals.
The subsystem level is the first place where the engineer gets to monitor how the user-defined block interacts with the other subsystem components. Both interface logic and application-specific logic bugs are uncovered. The layer 2 and 3 tests will flush out these bugs very quickly. The layer 1 tests will notify the user of protocol violations and the layer 2 and 3 tests will check to make sure all subsystem components react correctly. The layer 3 application-specific tests run at the block level and can be expanded to cover multiple components within the subsystem.
Verifying the Subsystem Using Logical Address Map Data
An extension to the standard layered verification approach for the system was required that was portable between system, RTL, gate, and real silicon verification tasks. The only real constant between all of these was the logical memory map itself. At the system level, the memory was modeled as a transaction level block. Simple read and write functions were used to access the memory. At this level, the system memory matches the logical address map. At the RTL and gate level, these blocks take the shape of the physical embedded and external memory architecture that has a defined shape and interface protocol. Now the logical address map has been split up into many individual sections. Even with the system memory split into many sections, the logical address map is still consistent. The GreenSIDE architecture implements the logical address map using the physical devices in an interleaved architecture. This architecture was described in detail in a previous section.
During the system-level tests, scenario data in the system memory was captured before and after the scenario had been executed. These images could be viewed as the initial buffer contents and the final buffer contents after a service has been executed. The strategy was to use the initial image again at the RTL and gate level and treat the final image as a golden successful image. The system-level image is a single data file as is the final golden data file.
Figure 7. Representation of System Buffer Data
An issue arose where this image needed to be loaded back into the memory models in the RTL and gatelevel simulations before the scenario could be executed. The actual implementation architecture of the memory, however, is not a single memory—it is the interleaved architecture of the embedded memory plus the external memory devices.
The solution was to use the DesignWare memory model feature called logical address mapping. This capability allows the models to load, dump, peek, and poke the memory images with a user-defined logical address map rather than being constrained to the physical implementation. Because the embedded memory models used the memcore, the logical address-mapping feature was also available. This was the key to enabling an almost fully automated verification flow from the system level all the way through to real silicon. The logical address mapping enabled the input buffer image to be directly loaded and then the final memory data image could be dumped at the end of simulation in any logically defined manor. This final image could then be compared with the system level golden image.
Figure 8. Loading and Dumping Images Using Logical Address Maps
Defining the logical address maps for the memory models was done using the MemScope graphical user interface (GUI). MemScope allows post process and interactive debugging of the memory models and can also be used as the logical address map editor. Device-based, sliced, masked, interleaved, and byte sliced interleaved logical address maps are supported.
Figure 9. MemScope Logical Address Maps
For the GreenSIDE project, many logical address maps were defined to map both the embedded memory and the external memory devices into a single logical view. The process was automated so that the correct logical address map was selected based on the scenario that was being tested. This enabled a full self-checking regression environment to be created that worked at the system level all the way down to gate-level verification. These same checks were portable to the final silicon.
Software Tests and MemScope as a SPY
In a traditional SoC verification flow, a SPY block is used to get debug visibility into tests that are generated by the execution of software. Software tests are written that are executed on the ARM Digital Simulation Model (ARM DSM). The ARM DSM is a full functional model of the ARM core in the SoC with the ability to execute the ARM code. On the real silicon, these tests execute very quickly, but when these are run in software, they will typically run for a very long time. The problem is that there is no easy way to check the progress of the simulation or quickly verify which tests passed and which failed.
Typically, a block called the tube or spy is used to add visibility. The tube is a memory-mapped block that can interoperate data as ASCII characters. During simulation, the ARM core can be instructed to write to a specific address that has been defined for the tube. The data is then translated to ASCII characters and displayed on the user’s monitor. Typical messages would include the status of the test, as well as any pass or fail test information. This flow has proved very effective. The problem with the tube block is that it does not have a physical implementation—it’s just a block during simulation. The tube simulation block also requires the design to be modified because it requires a dedicated address location to function. The address decode logic has to be modified to get the tube to function. This breaks the verification flow, because even though the tests can be re-used on the real silicon, no additional debug information will be available. In addition, the real silicon will not match the modified simulation version because of the changes required to use the tube block. This was not acceptable to the ST GreenSIDE project team. Changes to the RTL within the system simulation to support the tube could potentially introduce bugs.
Instead, the ST GreenSIDE team used Synopsys’ DesignWare MemScope to provide that same SPY functionality without any design modifications. Rather than having to modify the physical design implementation to provide the tube simulation block with an address location, the software tests were set up to write debug data into the external memory devices. MemScope was then used to monitor that location.
MemScope as a SPY Graphic
Figure 10. MemScope as a SPY
MemScope then provides the same functionality as the tube block without any modifications to the design. This removes the risk of introducing bugs when changing the design to support the tube block. MemScope was used to interactively monitor the status of long tests using the dynamic data exchange between the memory models and MemScope. The results of a test could also be loaded in a post process fashion as the complete memory model history could be captured during simulation.
With the debug information being sent to the external memory devices rather than to a fictional tube simulation block, the debug visibility was available on the prototype board as well. A normal hardware debugger could then be used to read the memory data that could be viewed as the same ASCII debug data. Using MemScope as the SYP block enabled a debug flow that again could be used from the system level all the way down to the real silicon.
Modeling The External Memory Devices
The GreenSIDE team used the large catalogue of pre-verified DesignWare Memory Models downloaded from the Synopsys website. Once a physical device for each of the interfaces had been chosen, the task was to simply download the corresponding model from the Synopsys DesignWare Memory Central website: http://www.memorymodels.com, install, and simulate with it. These models act as the real device would in respect to protocol and timing. They do full protocol and timing checks and flag if the interface or device timing is being violated. The pre-verified DesignWare Memory Models support all of the testbench-tomemory debug capabilities that the embedded memory models used.
The GreenSIDE project also required a memory controller that enabled communication to and from different external memory devices. This memory controller was a verified IP block, so it did not require extensive testing. It was needed during the verification task. The memory controller supported interfaces to SRAM, SDRAM and Flash. A configuration of these three interfaces was chosen to match what the real prototype board would contain. Identical tests from the RTL verification could then be executed on the prototype board. Doing this substantially sped up the real silicon verification process.
Using the Synopsys DesignWare Memory Models and tools streamlined the verification process for the ST Microelectronics GreenSIDE project. The simple task of adding the Synopsys memcore to the embedded memory models was easy and yielded immediate model feature enhancements. These new features enabled a clean verification process to be set up from system- to gate-level verification and included the real silicon validation. Using MemScope as a SPY enhanced the way that software test monitoring was done and cut down the risk of differences between the software system model and the real silicon. This verification environment met the goals outlined at the beginning of the project. It provided an accurate, cost-effective, reusable solution that fit into the existing design flow and reduced time to market. It will serve as the model for future verification projects.