Bohumir Uvacek, Toshiba TAEC San Jose, USA
The SoC content is defined by market needs and changes every half a year. The SoC methodology is driven by silicon technology and changes every 16 months. This quick pace of change demands that the starting point for most of today’s SoCs is IP. IP has matured to mean digital-IP, analog-IP, bus-IP, DFT-IP, debug-IP, verification-IP, RTOS-IP, applications-IP, legacy chip-IP, ... and is available on the IP-market like at D&R.
If interfaces had been standardized we would live in an efficient world of module-assembled SoC. Beforehand any serious SoC design team will need to develop wrappers, bridges and software APIs to come close to this ideal to quickly glue any IP with any other IP to a market-centered SoC. See figure 4.
Based on this vision a methodology has been developped and 3 SoC chips with applications and product prototypes were built over the last 3 years. Here is the experience after the silicon, see figure 1:
- IP /SOC design flow, IP based ASIC platforms
- IP/SoC qualification, emulation and prototyping
- ESL for IP/SoC
- Internet/ web based collaborative design, IP/SoC design data management
Finally, we will mention the critical aspects of the design process.
2 Open Soft SoC Platform
Aggressive goal is to integrate SoC from block diagram to tape-out in 6 months. Chosen IP blocks need to be integrated within 1-2 months. New IP developments have maximum 3 months time to get integrated.
During pre-sales of ASIC projects customers want to start writing software already. They can, by connecting their software debuggers over the Internet to our FPGA that has a similar chip loaded, as customer wants to design. Then the real chip design can start without pressure. See figure 5.
IP comes from providers including support in adaptation of RTL for the design. The contracts take longest time in this delivery process. Incoming inspection runs unit level testbench and in case of communication peripherals the biggest part of the verification work is extracting data trafic generators from the unit testbench and putting them into the system testbench. IP that does not have correct bus interface gets a wrapper, means a bus translation logic, see figure 4.
The system bus nowadays is also an IP-block itself and in the case of modern buses it comes with a generator that assembles the IP-RTL to a system-RTL around the bus. If several bus subsystems are present in a chip we just need to assemble IP-blocks around the subsystem bus first then we insert a bus bridge and we can call it subsystem-IP. The top level system bus assembles the complete design using the subsystem-IP blocks.
Once system tesbench and system RTL is assembled also the verification software to initialize and test all IP and system is assembled from existing libraries or written from ground up. Some short software is run in RTL simulators but most of the software is run in FPGA emulation where the real software debugger is connected as well.
We will end up with some 10,000 design files and after the project is over we will have filled 300-600 GigaBytes of disk space with simulations.
All verified IP and subsystems can be reused in future chips with very high confidence in correct functionality. The difference to hard IP and hard subsystems is only in clock speed and power; soft systems can be re-synthesized for low or high speed and power. Hard systems have one maximum speed and one fixed power consumption. 1
3 Expandable SoC Tool-Chain
A growing body of tools for SoC is available. Still few areas are not covered and improvisation is necessary.
The traditional verification was concentrating on RTL simulation with new verification languages and constructs to make quick automatic checking of RTL integrity possible. This turns out to be great tools for original IP developers with all detailed know-how of their IP designs.
SoC design teams are turning to FPGA emulation instead of accelerated RTL-simulation and into higher abstraction ESL tools using C++ models to speed up architecture exploration. The IP-RTL assembly will be helped with SPIRIT XML compliant IP-platform tools to handle the many RTL files correctly for each chip design. These tools also assemble the C++-system models, testbenches, the test software and the documentation. See figure 2.
The platform assembly tools do not simulate. RTL co-simulation (like Seamless) or C++ co-simulation is a second step in the tool chain. FPGA chip emulation and product prototyping is a third step. Due to market survey most completed SoC chips worked but many did not have enough data bandwidth or computational performance. That can be fixed only during architectural system exploration using system simulation at the beginning. Or by using flexible/adjustable buses like OCP without needing much upfront simulation. See figure 3.
Since most activity is in software the connection of debuggers to the design is important. FPGA offers perfect connection to debuggers over JTAG but in ESL simulation there are no solutions available that would allow debuggers of choice to be connected, question marks in figure 3 depict the missing tools.
The ESL abstraction level suffers from missing C++ models for the RTL-IP on the market. They can be ordered with cost and time penalty. Any update of IP needs update of such models what introduces a new need for even more verification. However C++ simulation is fast, cheap and data bandwidth is greatly visible on this level.
4 Design Process and Goals
The SoC-chip success is defined as start of its volume production. Therefore the focus in a SoC-chip project is from the beginning on product prototyping including application software. See figure 5.
The 20 to 40 IP blocks would take a long time to check using just traditional RTL level simulation. Therefore main focus in verification has been shifted to high-speed execution on FPGA using software executed on the SoC core processor.
Quite a few changes are made by hand before a FPGA synthesis is possible; e.g. memories, analog blocks, clock generation are replaced. So the main goal is to maintain synchrony between RTL in FPGA and RTL in the chip. Checking of system connectivity and functionality with software using FPGA was a matter of 3 months once the FPGA prototype was available.
The verification software consists of an initializing part for each IP in the simple boot code and a connectivity test part for each IP. This set of software is executable in RTL simulation and even on the synthesized gate level in a last checking effort before tapeout. See figure 4.
A second set of system verification software will test all critical application modes and the communication through the peripherals. This bigger software is run mainly on FPGA and connects to real world hardware devices, preferably the ones used in the final product to have an interoperability proof. See figure 5.
Finally the application is run in this product-like FPGA prototype environment as a test of the true chip traffic and computational load before tapeout. The FPGA system is delivered to a customer after encryption of the sensitive RTL. RTL updates are sent in encrypted form to customer for loading. Or, FPGA remains in the RTL design group and customer software people connect to debugger tools over the Internet. Feedback of LED lights and moving parts of the prototype are communicated over high quality web-cam to the customer.
5 Verification Statistics From Three Projects
In general the more detail we want the slower the simulation. To identify system errors we use execution of lots of system software code on FPGA. Verification coverage is a balance between time and confidence. The table explains the time increase in simulation of the same software code in different environments.
|Timed Gates ||200000 sec ||(2.8M x) |
|Gate Level ||17920 sec ||(260 000x) |
|Verilog RTL ||8960 sec ||(13 000x) |
|Seamless ||900 sec ||(1 300x) |
|FPGA ||3.5 – 10.5 sec ||(5-15x) |
|Chip ||0.7 sec ||(1x) |
Table: Simulation Runtime Comparison of Tools
Timed Gates 200000 sec (2.8M x) Gate Level 17920 sec (260 000x) Verilog RTL 8960 sec (13 000x) Seamless 900 sec (1 300x) FPGA 3.5 – 10.5 sec (5-15x) Chip 0.7 sec (1x)
The first strategic decision is to leave IP-verification to IP-providers and concentrate on system verification. The second decision is to verify most in FPGA, less in RTL and little in gate level simulation. The third decision is to run the final application software before tape-out to concentrate on the operation modes of the IP and system activity that will be really used.
The statistics proves that IP is 30 times less error prone than the system assembly. We make on average one change per IP-block and 30 changes in the system.
Remaining errors can only emerge from mismatch of RTL between FPGA and chip or customer specification misunderstanding in topics skipped during application testing. DFT insertion that disturbs functionality is a possible error source since complete verification in gate level simulation is not economical. SoC based on 3rd party IP has also been found not well suitable for functional manufacturing tests because lacking complete reset of all states. Gate level simulations often stop with undefined X-states.
6 Reusable SoC Items
IP reuse must be understood as “reuse with modifications”. Good IP is prepared for the most common changes like different bus speed, FIFO sizing, bus interface changes, memory sizing, reduced modes of operation, selection of datawidth, … and is more kind of an RTL generator than a fixed RTL code. Fixed-RTL IP needs support of the IP-provider to make a change and then adapt the testbench and the documentation as necessary.
New chips can start with the original IP deliverables or use the adapted/modified IP-blocks as a starting point. Even whole chip systems can be reused in a derivative design that deletes a few IP-blocks and adds a few new ones. A reusable database of IP and systems is CVS. If used correctly CVS can store UNIX environments, test scripts, original IP besides the latest design versions. The new SPIRIT XML platforms will be a better database.
The verification software to test individual IP is highly reusable since it is written in a modular manner. For a new chip just a new assembly of the code pieces for different IP-blocks is necessary. See figure 4.
The design unspecific FPGA system “Swordfish” has been reused 3 times and the discrete stackable boards to mimic the analog IP-blocks, memories and external PHYs (USB, Ethernet,) have been reused as well.
The DFT insertion is chip specific and happens mostly on the gate level. That means just the DFT concepts can be reused in a new chip. Some DFT parts are in RTL, called test-collars, and can be reused.
The part that is not reusable is the specific system-bus traffic-test and the application. And of course timing closure and backend work has to be repeated.
HW/SW co-development is essential to speed up complex SoC verification. FPGA emulation is essential for HW/SW prototyping including software based hardware debugging and for the application testing before tapeout.
Efficiency was optimized by demanding that every line of RTL or software was final and did not need to be rewritten during the project for another environment.
The developed SoC methodology worked very well and delivered 3 chips with 3.5-4.5 Megagates and different purpose like high data bandwidth or lowest power consumption or dual cores for highest computational perfomance.
8 Missing Pieces and Outlook
Hardware-software co-development tools are indispensable in SoC design. At this time the best environment for it is FPGA emulation. ESL promises to deliver on the SW/HW co-development needs. Yet several holes must be closed first.
Main new issue in ESL is having clock precise yet abtract C++ models of IP available for the system and for the testbench. Missing are peripheral traffic generators in C++ for standard protocols like USB, SDIO, Ethernet, PCI, ... . And ESL will need to include missing API interfaces to software debuggers of choice.
FPGA on the other hand misses observation of hardware activity that would be presentable to software people. The full access to all nets like in RTL-accelerators destroys the speed and more importantly is not suitable for software people who want bus traffic analysis, interface signals and direct register access.
One solution is to include on-chip system-debug-IP with a bus analyzer and with IP-interface observers like MED, Multicore Embedded Debug. At least in FPGA emulation, and if economy alows included also on-chip. Without an on-chip MED the application software practically cannot be optimized or tuned especially in heterogeneous muti-core chips.
One open area is how to assure that the FPGA RTL and chip RTL are eqivalent. The RTL for one FPGA can be compared by formal-equivalence tools but if the design is spread over several FPGAs then no automatic tools exist yet. Therefore human dilligence is used instead ... .
Last idea to cover also the DFT insertion by faster system simulation is to run the gate level chip netlist in FPGA. It is good for a final functional verification, yet the chip timing closure of course cannot be verified in this way. 3
 SoC and End Product System Design Using Hardware Prototyping, IQ Information Quarterly, ARM Journal , Vol4 Num2 2005, Jim Tobias, William Wu, White Eagle Technology
 HPCA-11 MIT Workshop, Feb 2005, Experiences with Multi-Core SoC Designs with FPGA Prototyping, William Wu, Jim Tobias, White Eagle Technology, Will@whiteeagle.us
 ARM Developers Conference, Oct 19, 2004, Santa Clara, presentations:
C.Y. Pei: Co-Development Challenges with Benchmark Results
Ashish Parikh: A Software Abstraction Layer Facilitates HW/SW Co-Development
Steve Williams: An FPGA Emulation of Dual ARM Core Mixed-Signal SoC.
Mark Mazur: From Block Diagram to Custom ARM-Core Based SoC in 6 Months
Bob Uvacek: On-Chip Debug Instrumentation for Custom SoC, firstname.lastname@example.org
 Multi-Core Embedded Debug, IQ Information Quarterly, ARM Journal, Vol3 Num5 2004, Dr. Neal Stollon, MIPS
 Software Abstraction Layers Facilitate HW/SW Co-Development, IQ Information Quarterly, ARM Journal , Vol3 Num5 2004, Steve Williams
 GSPx, Sep 2004, Santa Clara, SoCMosaic Custom ChipTM with OCP, Richard Tobias et. al
 DesignCon Feb 2004, MED Multi-Core Embedded Debug for ASIC Systems, Dr. Neal Stollon, First Silicon Solutions, email@example.com
 Link on TAEC Website, Feb 2003, SoCMosaic CustomTM Chip, http://www.toshiba.com/taec/
Figure 1. Putting It All TogetherLAYOUT
Figure 2. Organized SoC Library and Data StructureClick to enlargeFigure 3. SoC Tools
Figure 4. SoC Soft IP Platform and SoC Software APIsIPIPSYS-
Figure 5. Swordfish and Real World Rapid PrototypingPC