Abhinav Gaur, Gaurav Jain, Ankit Khandelwal (Freescale Semiconductors)
The applications for modern day SoCs are becoming bulkier in size, which in turn requires a huge amount of storage. These storage solutions can be On-chip or Off-chip, depending on the system requirements. SoCs today consist of various types of on chip and off chip storage solutions – like SRAM, DRAM, FLASH, ROM, etc. which can be used for a multitude of applications. For example, a piece of software code can be placed in different memories in a system, storing the incoming data from cameras in different memories, and computation-intensive peripherals use different kinds of memories for performing various operations and so on. There are use-cases which include data processing by multiple masters and usage of more than one memory type in its complete data path (for storing the intermediate data) and hence data sanity needs to be assured at each and every step .With the already crunched timelines for the execution, it becomes necessary to device methods to save verification time. This paper discusses about the requirement for backdoor loading and comparison of processed data (in digital verification environment) and explains a method to implement such a scheme, which saves a lot of simulation time, while maintaining the data integrity.
Why Memory backdoor loading/comparison is required
Memories are used by different masters to store the code and data. In real application, these masters use different memories present in the system based on the system use-case requirement. While doing so they will be writing and reading data from the memory. With time, the amount of data which needs to be stored and processed is increasing. In simulations, loading the data and comparing the processed data via masters is virtually impossible when it comes to huge data (say, in order of MB’s). The general practice is to load the initial data which will be used by the masters for processing via backdoor. After the processing, few chunks of data in the memory are checked at start, middle and the end locations for sanity, reading each memory location by the core present on the SoC and comparing it with the expected data. This method is indeed “very inefficient” because of the following reasons:
- Takes a lot of simulation time which tends to increase with increasing data size.
- Can mask any possible issue which might exist for uncovered memory locations.
For example, suppose we want to load 2KB of data via core (working at say 100MHz). Assuming that the core takes 4 cycles per transfer with 64 bit data bus interface (8 bytes per transfer), then the time taken to load this amount of data will be (2000/8) * (1/100) * 4 = 10us. Now, 10us time in simulation will translate to a considerable test time while running this case, simply to load the data! Instead, if we load this data via backdoor, we can save this entire 10us simulation time. In short, backdoor loading is way more efficient as compared to front door loading for test cases where multiple memories are to be loaded or compared.
Hence, we need a flow which provides a much faster and accurate method of backdoor loading and comparing the data. However, dealing with the preloading of so many different kinds of memories, each having a different kind of simulation model can be difficult to manage in a testbench. This paper presents an approach of doing so using user-friendly tasks that it not only makes life simple for the verification engineer, but also gives a good confidence on the correct functionality of the complete data path of the system. These tasks load and compare the final processed data with the expected data in “ZERO” simulation time and are independent of the data size thus saving a lot of simulation time.
Building blocks for memory loading/comparison
The below section explains the basic setup for memory loading/comparison. The first step is to assign unique IDs to various memory chunks present in the design. Let’s try to understand this with the help of an example where we take different memories present in system and develop different tasks.
A task is created which takes address as input and returns the memory ID. Let us name this function as “return_mem_id“. The address here is the memory mapped address.
Another important factor to be considered is the memory width of each of the memories. Different memories have different widths depending on the system requirements. For this, another task can be created, say “return_mem_width”, which takes the memory IDs as input. Also, the memory map address of each memory will be different from the actual physical address of the memory cells. For backdoor loading, we would require the physical address of the memories and not their memory mapped addresses. A task would be required for this, which again, takes the memory ID as input and returns the physical address by which the particular memory needs to be addressed for backdoor loading. Let us name this task as “return_physical_address”.
Now that we are done with creating the basic tasks for calculating the memory ID, width and physical address, we can proceed to build the “memory_write” task. This task takes the memory ID, physical address and write data as input. The width of data that is to be written depends on the memory width. Depending on the memory ID, it does the backdoor loading for that particular memory. Each memory will have a different way of backdoor loading. For example, if Denali models for the external interfaces like Flash, DRAM etc. are used, there are special PLI tasks which are used to load the data. That can be found out by studying the model of that memory.
Similar to the “memory_write” task, a “memory_read” function can also be created which also takes the memory ID and physical address as input and returns the read-data.
Many memories support ECC or parity feature also. For example, for 8 bit data, there may be 5 bits ECC associated with it. The “memory_write” task can be enhanced to calculate ECC each time while loading data to each location of the memory. Similarly in the “memory_read” can be enhanced to read the data from each memory location, feed it to a ECC calculating task, and compare it with the ECC read from the memory location. This also helps in checking that ECC is stored correctly in the memory at various stages of the data flow in a use-case.
These functions mentioned above can be stitched together to provide a seamless flow for backdoor loading and comparison of different memories of the SoC.
Testbench Infrastructure for loading/comparison of memories through command line (as simulation arguments)
1. Loading of data
For loading of memories, the user simply needs to provide:
- The memory mapped address of the memory where he wants to load the data.
- The format of data (16 bit, 32 bit, 64 bit, etc.).
- The simulation time (in terms of timescale unit of the testbench) at which loading is to be done.
- The complete file path which contains the data to be loaded.
A top level task processes all this information. Depending on the format of data, it parses each data line of the data file and loads the data at various memory locations using the takes mentioned earlier (memory_write, etc. ). The top level task repeats this process for all the memories which are passed through command line for being loaded. The task can be made user-friendly so that if the user does not give the format of data, it can calculate by itself, by parsing lines of the data file.
2.Comparison of data
For comparison of data present in memories, a C-side function is created which can be called at any point in the testcase for memory comparisons. The memory comparison information that needs to be passed through the command line is:
- The address from where the comparison needs to begin.
- The format of data (16 bit, 32 bit, 64 bit, etc. ).
- The path of file which contains the data for comparison.
The C- side function sends a trigger to testbench side, triggering a top level memory comparison task. This task compares the data by parsing it one at a time (depending on the format of data) from the data file, with the actual read-data being returned by the “memory_read” task. Similar to memory loading, if the user does not give the format of data, the task can calculate that itself by parsing the lines of the data comparison file.
For example, this is what a sample C-code containing memory comparisons at various stages look like this:
<Input data fetched from a source and stored at location in SRAM1>
memory_comparison(1) //Compare the data at location, format, etc. as per 1st memory comparison argument
<Fetch this data, process it, and store in SRAM2>
memory_comparison(2) //Compare the data at location, format, etc. as per 2nd memory comparison argument
<Process the data again and place the final data in DDR0>
memory_comparison(3) //Compare the data at location, format, etc. as per 3rd memory comparison argument
<End of Test>
The above described method has been implemented and has led to uncover many issues in various SoCs during design phase. Following are the main advantages of using the mentioned scheme in verification environment:
- It saves a lot of simulation time by loading and comparing the memory data via backdoor.
- It simplifies the process of loading/comparing of memory data for the end user since he/she has to do it via command line. The user does not have to call separated tasks for different memories since everything is taken care through the command line which in turn calls a series of tasks (mentioned in the building blocks and testbench infrastructure section).
- This scheme is user friendly since it can automatically detect the memory type based on the address passed, the memory width/ format by parsing the lines of the data loading/comparison files, etc.
- It also takes care of the ECC or parity feature of the memories by calculating and loading it during memory loading and checking its correctness during memory reading (or comparison).
- It indirectly serves as a method of checking the endianness of the system since the memories are loaded and compared as per the endianness defined for the system.
- In case of memory comparison, the suggested scheme has added advantages like:
- It helps in saving memory space used in a test case since in legacy approach, the core would first write the comparison data in a particular section of memory and then read it back one by one while doing comparison with the final data.
- If the comparison data is embedded in the test case itself, it would increase the code size which in turn will take more memory space.
With this we are able to achieve faster data integrity checks without any software overhead. This is indeed very useful from digital verification perspective as just waiting for the software (usually core) to load and compare the data takes a lot of time and is definitely not a good idea. Such smart methodologies are the need of the hour as someone rightly said “The more you value your time, the more value it will bring”.