Update: Cadence Completes Acquisition of Tensilica (Apr 24, 2013)
Grant Martin, TensilicaSanta Clara, California, U.S.A.Abstract:
The design of embedded systems realised in System-on-Chip (SoC) form is increasingly turning to the use of configurable and extensible processors. Such designs start at a high level of abstraction, with algorithms or application programmes that are expressed in C/C++ or even higher level forms. Configuring and extending the processor(s) is best done at the “Electronic System Level” (ESL). But the current state of ESL tools is inadequate for the multi-processor SoC (MPSoC) design methodology required. This paper briefly surveys ESL and IP-based design, outlines the requirements for supporting design with multiple configurable, extensible processors, and sketches the characteristics of possible solutions.INTRODUCTION
Increasingly, the design of embedded systems and SoCs is turning to the use of configurable, extensible processors [1,2] both to achieve time-to-market goals and to provide flexibility and programmability for post-fabrication reuse of complex platforms. Extensible processors in particular are proving to be more flexible and application-adaptable substitutes for HW blocks in many designs, while achieving acceptable performance and power.
As result of this shift to an increasingly processor-, and multi-processor-, centric design style, the need for architecting and designing embedded system platforms at higher levels of abstraction than traditional methods allow grows ever more urgent. Design methods at what is called ESL (electronic system-level) are growing in importance with the increasing adoption of processor-based design approaches.
But although some useful ESL-based tools and methods are beginning to emerge, what is available
falls short of the requirements for processor-based SoC design. There are important gaps in capabilities that must be urgently filled. By considering those gaps, we can begin to see some indications as to how this may be possible.
THE STATE OF ESL AND IP-BASED DESIGN
We can divide the general area of ESL into five areas of emphasis:
- Algorithmic design and implementation
- Behavioural synthesis
- SoC construction, simulation and analysis
- Virtual system prototyping
- Function-architecture codesign
Algorithmic design and implementation tools allow the capture of algorithms, both control-oriented and dataflow, simulation, analysis, the modelling of external environments, and potentially code generation into both hardware and software implementation flows. Examples are tools offered by the Mathworks (Matlab and Simulink), CoWare (SPW) and Synopsys (SystemStudio). Some recent EDA offerings (e.g. Accelchip, Catalytic) offer some automation in the flow to implementation. Most of these tools are oriented to dataflow or data-intensive algorithm modelling and implementation. But there are tools such as iLogix Statemate, and the Mathworks StateFlow, that allow finite-state-machines for control logic implementation to be captured, simulated, and used to generate C code for either software implementation or possible hardware implementation via a C-to-gates hardware flow. Some of these tools have also been used to create hardware more directly via VHDL or Verilog generation.
Behavioural synthesis tools are a new generation of primarily C/C++ or SystemC subsets to RTL offerings that are arguably more appropriately attuned to the needs of designers (often algorithmic and software designers, not hardware designers) than the first generation of commercial tools that were designed for hardware designers. Examples include Forte Cynthesizer, Mentor Catapult, Celoxica and Bluespec (using certain SystemVerilog constructs). Some of these tools have been developed internally at large systems and semiconductor companies - NEC’s Cyber being a notable example . In addition to their use for implementation, these tools have been used as part of a whole “C-based” design and implementation flow, where the use of the same input C/C+ model for both implementation via synthesis and simulation at 10-1000x faster rate than RTL is a significant enhancement in design productivity. It also opens up a path to hardware design for the systems, software and algorithm specialists who have been outwith the hardware community.
SoC construction, simulation and analysis environments offer graphical capture of SoC platform structure using traditional buses, with a library of standard embedded processors (e.g. ARM or MIPS), and other components (memories, special hardware blocks, and peripherals). They then allow simulation of the resulting design, usually using SystemC or C/C++-based Instruction Set Simulator (ISS) models and SystemC models of other components, and some system-level analysis of design characteristics such as bus loading, contention, memory accesses, and processor loading. These tools are offered by CoWare, ARM (Axys), Synopsys, Prosilog, and Summit, for example. They are most effective when the system architecture is known, the major choice of IP blocks has been made, and the system requires some micro-architectural tuning, and detailed verification.
Virtual system prototyping tools, from vendors such as VaST, Virtio and Virtutech, offer simulation models of single or multi-processor SoC platforms that execute at speeds at tens of MHz and higher, as opposed to the slower speeds of normal ISSs . These are useful for software developers who want to execute their embedded SW on models which are close to the actual implementation but still reasonable in performance. Systems architects who need to test, modify and characterise the performance of complex applications on an intended target platform, and need to run many test scenarios, can also make use of virtual system prototypes for this purpose as long as the major characteristics of their target platform are fixed.
None of these tools offer an early ability for designers to figure out their basic system architecture, determine the number and kind of processors for their system, design their on-chip communications architecture (beyond classical hierarchical buses), partition their software applications into multiple tasks, map them to processors, and explore the design space. This kind of “function-architecture” codesign was tried in the late 1990’s with projects such as VCC , but is not offered in today’s commercial ESL tools. The SoC “constructor/assembler-simulator” tools are mainly useful when the basic architectures are known and the software has already been mapped to fixed processors.
The early attempts at “function-architecture” codesign such as VCC failed for many reasons. The target architectures of the mid to late 1990’s often were single processor plus hardware platforms, in which the processor choice was usually pre-ordained, and the processor itself was a fixed Instruction Set Architecture (ISA) that could not be varied. Thus the need to explore a rich space of design alternatives was not necessary with such designs, and the alternative approaches using hand-crafted models, ISSs and HDL simulators, were usually adequate. Abstract modelling styles such as the “virtual processor model” approach of VCC were neither well understood nor necessary for the relatively small software applications of the time – and they did not offer cycle-accurate simulation, that is regarded as a fundamental necessity for any ESL offering, at least as a base. The availability of processor and IP models was an issue at the time, and the lack of any standards for system level modelling was a real barrier to their emergence.
Today we find a different situation in ESL and IP-based design. At a fundamental level, SystemC  has definitely emerged as the de facto ESL modelling language for complex IP blocks and SoC platforms and designs – although there is still significant use of other modelling approaches especially in the virtual system prototyping space. Nevertheless, if one wishes to contemplate the exchange of potentially interoperable IP models, SystemC, and the theoretical concept of transaction level modelling, is the right basis for it.
A number of different approaches have been tried over the last decade to automate the generation of instruction set simulators for complex processors, whether fixed ISA or configurable and extensible processors, and such models, which can interoperate with a SystemC modelling environment, are now widely available. In addition, the use of C-based behavioural synthesis and C-based modelling styles for hardware make more compatible IP models for hardware blocks available. EDA companies, such as Tenison and CarbonDA, are offering tools that allow legacy RTL blocks to be translated into models that interoperate at the C/C++/SystemC level, and with improved speed (around 5-10X) when compared with the RTL alternatives, thus making higher level simulation of mixed processor and hardware platforms more feasible.
Finally, today’s target architectures are much more complex than the simpler processor plus hardware ones of the late 1990’s. From the smallest and simplest handheld wireless devices, which often contain at least a baseband SoC incorporating a control RISC, a DSP for voice processing, and considerable hardware, through to the now standard cell phone with additional audio and image processing capabilities, to the most complex consumer and infrastructure devices, current SoCs and systems exploit the technology made possible by sub-100 nanometer IC processes to incorporate many processors, many memories, complex on-chip communications buses and networks, and many hardware blocks, organised into a hierarchy of co-operating subsystems. Software content has risen inexorably and even the simplest devices incorporate hundreds of thousands of lines of code – and millions of lines of code is becoming the norm. The design, and reuse of such architectures and platforms is becoming increasingly intractable using traditional methods, necessitating the rise of ESL design approaches. And when we move from fixed-ISA processors to configurable and extensible ones, the design space explodes further.USING CONFIGURABLE AND EXTENSIBLE PROCESSORS
The design of Application-Specific Instruction-set Processors (ASIPs) is increasingly becoming an important part of embedded systems design. Methodologies and tools for configuring and extending ASIPs have begun to emerge from academia [7,8] and the IP industry , and there are examples from the commercial ESL tool industry for processor and co-processor synthesis from suppliers such as CoWare (Lisatek), Synfora, and Critical Blue. Some of these tools can be thought of as a non-classical form of behavioural synthesis, in that if driven by C/C++ code, and resulting in a synthesisable processor description, they provide a kind of “C to gates” transformation capability, albeit with a processor instruction set architecture as an intermediate form. Tensilica’s XPRES tool  can also be thought of in this context, although its intermediate form is to generate instruction extensions in the Tensilica TIE language that can be used by designers to further manually optimize a specific configuration of the processor.
If SoC designs were sticking at single processors, perhaps with some accelerating hardware, then these tools and methodologies might suffice. However, this is not the case. Already many SoC’s utilize at least two processors (a control RISC and DSP) and the next generations are leading towards 6 to 10 processors and beyond. In this case, the design methods and tools are distinctly lacking in their support of this design style.BUILDING MPSOC SYSTEMS WITH CONFIGURABLE PROCESSORS
There are several key questions involved in designing a multi-processor SoC, especially one involving configurable processors. These include:
- How many processors for an application? (or set of applications)
- How should they be configured/extended?
- Should they be homogeneous or heterogeneous?
- How should they communicate? Standard buses, Network on Chip (NoC), point to point or a combination?
- What is the right concurrency model? Pipelined dataflow, or multi-threading?
- How can designers extract concurrency from applications? How do they partition them to match?
- How do they explore the design space offered by configurable processors, multi-processor SoC, new communications architectures and memory choices?
- How do you scale from 10 to 100 to 1000 processors in sub-90 nm. technology?
ESL tools that help answer such questions are not available from the commercial EDA tools industry. Given that the number of SoC architects is relatively small, and the tools may need to be highly IP-specific, especially in the case of configurable processors, this may remain the domain of IP providers.
Such tools may help designers and architects by providing an integrated flow with the following steps:
- Start with applications/algorithmic code
- Decompose into multiple concurrent processing tasks
- Map tasks to multiple optimised processors with an idealised communications network
- Iterate processor definition and mapping
- Analyse communications network requirements
- Design concurrency control and scheduling model
- Design communications network (including memories, buses, queues)
- Analyse results and iterate/experiment with alternative configurations
- Iterate until a balanced MPSoC system definition is reached
- Pass specification on to detailed HW/SW implementation
This can be described as an application-driven, “top-down” design flow, which may be suitable in some application spaces for MPSoC subsystems. In particular, where brand new functionality is required, or previous subsystem designs are inadequate for new standards in an application domain, starting again with a “white-sheet” design space, and driving the architecture of the subsystem from the characteristics of the application code, may produce an optimal solution.
Alternatively, “platform-based” design  ESL flows may be appropriate where systems and semiconductor suppliers must provide more generic MPSoC based platforms where the application domain may be known but the specific applications and code are not available to drive the configuration of all the platform processors and other components.
Such platform-based design flows can reuse many of the same design capabilities as are required to support a top-down, application-driven methodology as described above. In particular, the ability to specify the structure and architecture of a platform subsystem, to simulate and analyse it, and to iterate on the number and types of processor(s), and the structure of the associated memory hierarchy and communications subsystems, are useful. What is missing here is being able to drive the platform design with the actual end applications code for the particular product. In place of the end applications code, either similar or related code or code kernels drawn from the general application domain (for example, audio or video encoding or decoding algorithms from a previous generation) may be useful in determining some of the architectural requirements and processor optimisations. Another possibility is to look at more artificial code sequences that can generate traffic patterns and consumption of processing and communications resources, and use such “traffic generation” code to characterise the capabilities of the platform under design.
Once a platform has been defined using such a flow, it could be passed on to systems and software applications specialists and be the target for applications mapping, possibly using abstractions for communications between tasks on different processors or any supporting multi-threading models. The combination of the multi-processor platform together with communications abstractions would provide a programming model for applications software development in the derivative design process.CAPABILITIES OF AN MPSOC ESL SOLUTION
The capabilities required in an MPSoC ESL design approach include features in an integrated development environment (IDE), in modelling of systems, in abstraction, mapping and design space exploration, and, especially for configurable extensible processors, in the processor configuration environment and features.
An integrated development environment (IDE) of some kind is extremely desirable as part of the MPSoC ESL capabilities. Recently, the Eclipse development environment , an open-source IDE that started for Java development but has been extended with a C/C++ development toolkit (CDT), has become an extensible IDE used by a variety of IP and EDA tool developers. In such an IDE, it is possible to extend existing software tool development, targeting, launching and debugging capabilities to support many aspects of MPSoC design at a more abstract level. For example, it is possible to create a processor configuration graphical user interface (GUI) within this environment, and use it to capture language-based descriptions of instruction extensions that can then be targeted to be compiled by a specialised external compiler. Since such extensions compile into HDL code at the RTL level, in this sense the IDE is being used for specifying and implementing large parts of a HW-SW system.
The IDE software project editing capabilities could be used to capture, edit, target and revise application software tasks, middleware and other execution software for the MPSoC target system. Specialised editors could be created within an IDE to allow system definition consisting of configured processors, memories, communications interfaces, queues, buses and bus interfaces, and dedicated hardware processing blocks and peripherals.
Extensions to the IDE capabilities could include the ability to generate system level simulation models, to launch processor ISS models as single-processor simulations, and to launch whole subsystem simulations; to trace either statically or dynamically the transactions that occur at the system level between processors, peripherals, hardware blocks, buses, arbiters, routers, memories and the like; to trace the individual processor execution streams; to post-process such trace data, and display both Gantt-style activity charts for the use of system resources, and to compute statistics on system level performance parameters. These could be displayed graphically within the IDE, and facilitate advanced analysis of system considerations such as transaction latency, contention for resources, processor stalling, memory occupancy and activity, processor load balancing, and many other capabilities. Single and multi-processor debugging, using a variety of mechanisms including breakpointing, single and multiple stepping, complex multi-processor event synchronisation and the like are all possible through extension of the IDE debugging capabilities.
A second capability linked to the capture and editing of system structure and the generation of system simulation models is the use of a standard format for the temporary storage of system structure and IP parameters – (the IP “meta-data”). Recently, XML-based formats such as SPIRIT, used by tools such as Mentor Graphics’ Platform Express and others, have become popular . These may not be elegant, but XML-based formats and schemas can be quickly extended, parsed and generated and are an interesting way to store system structure and parameters.
With system structural information and the right level of ISS models in standard formats, it is possible to generate simulation models that are suitable for both simulating many operating scenarios in a system, and tracing and analysing the operating conditions. Many ISS models exist that can be wrapped to operate in a SystemC environment, so that they can interoperate with models of buses, memories, hardware blocks, peripherals, and other MP subsystem resources. To do this effectively, the models should be C/C++/SystemC based, offer cycle accuracy as a basic capability (even if they also offer faster functionally accurate simulation modes), and interoperate at a transaction level. Transaction level modelling is an extremely important evolution in the ESL methodology ; however, there are currently no standards for interoperable transaction level models. In early 2005, the Open SystemC Initiative (OSCI) created a first level “foundation” for transaction level modelling (TLM) , but this is too primitive and limited a standard to be actually useful in ESL modelling. It is limited to the transport layer of very basic memory-mapped data read and write transactions only, and lacks support for the richer real-world transactions used by real processors and on-chip buses, including block or burst read and write transactions, conditional transactions, support for unconventional communications interfaces such as point to point FIFO queues, simulation semantics to allow sequencing of transaction request generation by masters and transaction response satisfaction by slaves, transaction data widths beyond conventional typing, and the kinds of simulation loading and debugging transactions (poking, peeking) essential to allow efficient system simulation. In the absence of a workable OSCI TLM standard, and given the opaque nature of OSCI’s workings and its lack of a development roadmap, system simulation in the ESL world must continue to rely on the ad-hoc creation of adaptors between every IP provider’s and user’s particular notions of ‘transactions’.
Fast functional simulation, sometimes also called the creation of ‘virtual system prototypes’ , is a useful complementary capability to cycle-accurate TLM models. The latter allow detailed and timing-accurate (by cycles) analysis of system activity, loading, errors and performance issues; the former allows the creation of a very fast functionally correct simulator for a system, particularly desirable for software development and validation. Where a cycle-accurate TLM system simulation might run at hundreds of thousand or a few million cycles per second, a fast functional simulation may run at many tens of millions of cycles per second and may still be able to provide cycle count estimates that are extremely accurate – certainly within +/- 5% error –for MP systems operating under ‘normal’ conditions.
In order to allow efficient design space exploration (DSE) of various MP architectures for a particular application, it is extremely useful to offer abstract programming models which allow the various software tasks to be mapped to processors, scheduled for execution, and to communicate without constantly modifying the source code. Every need to modify the source code to explore an alternative mapping of task(s) to processor(s) limits the amount of design space exploration that can be done. Unfortunately, there are no standards yet for abstract programming models that allow this to be done efficiently. Pipelined dataflow models are one attractive and reasonably simple model that have been studied for years and interesting communications API models such as Philips TTL  have begun to emerge. Simultaneous multi-threading approaches are also attracting interest especially for homogeneous clusters of processors with hardware support for thread context-switching and scheduling . Use of communications abstractions allows re-mapping and effective DSE.
Configurable, extensible processors are the backbone of a high-performance MPSoC subsystem, and the availability of automatic techniques to derive instruction extensions  allows the final configuration process to be delayed till almost the end of the DSE process. By automatically generating a configuration and extended ISA early, a lower bound performance envelope can be established for the initial task mappings onto a processor, which can be quickly redone whenever the task mapping changes. Then, manual improvements done at the end of DSE to the final mapped processors can establish extra performance overhead and allow last minute processor load balancing and power optimisation through the strategic reallocation of a few tasks and the slowing down of the processor frequency, preferably by voltage reduction. Automated software tool chain generation (compiler, ISS, debugger and IDE extensions)  allows every modification to the processor to be reflected in the generated system.
This is but a brief sketch of the ESL capabilities that support effective MPSoC subsystem design, but all of them fill gaps, or complement, those in the currently available ESL tool portfolios from commercial vendors.CONCLUSION
The design of complex embedded systems with multiple configurable, extensible processors demands new ESL tool capabilities that go well beyond current offerings. These are more likely to come from IP vendors than from the commercial EDA industry, although it may be possible to build IP-specific flows using generic ESL and IP-specific tools. In this paper we have outlined some of the key problems that such flows should offer design solutions for and sketched some of the characteristics of the required tools.REFERENCES
 Chris Rowen and Steve Leibson. Engineering the Complex SOC. Prentice-Hall PTR, 2004.
 Steve Leibson and James Kim. “Configurable processors: a new era in chip design”. IEEE Computer, July, 2005, pp. 51-59.
 K. Wakabayashi, “C-based synthesis experiences with a behavior synthesizer, ‘Cyber’”,
DATE 1999, March, 1999, pp. 390-393.
 Graham Hellestrand, “The engineering of supersystems”, IEEE Computer, Volume 38, Issue 1, January 2005, pp. 103-105 .
 S. Krolikoski, F. Schirrmeister, B. Salefski, J. Rowson and G. Martin, “Methodology and Technology for Virtual Component Driven Hardware/Software Co-Design on the System Level”, paper 94.1, ISCAS 99, Orlando, Florida, May 30-June 2, 1999.
 Thorsten Groetker, Stan Liao, Grant Martin and Stuart Swan, System Design with SystemC, Kluwer/Springer, 2002.
 Matthias Gries and Kurt Keutzer (editors). Building ASIPs: The MESCAL Methodology. Springer, 2005.
 Makiko Itoh, Shigeaki Higaki, Yoshinori Takeuchi, Akira Kitajima, Masaharu Imai, Jun Sato, and Akichika Shiomi, “PEAS-III: An ASIP Design Environment”, ICCD 2000, pp. 430-436.
 David Goodwin and Darin Petkov. “Automatic Generation of Application Specific Processors”. CASES 2003. San Jose, CA, pp. 137-147.
 Henry Chang et al., Surviving the SOC Revolution: a guide to platform-based design, Kluwer/Springer, 1999.
 P. van der Wolf, E. de Kock, T. Henriksson, W. Kruijtzer, and G. Essink, “Design and programming of embedded multiprocessors: an interface-centric approach”, CODES+ISSS 2004, pp. 206-217.
 Pierre Paulin, C. Pilkington, E. Bensoudane, M. Langevin and D. Lyonnard, “Application of a multi-processor SoC platform to high-speed packet forwarding”, DATE 2004, Volume 3, pp. 58-63.