SoC-based designs challenge traditional design flows
By Ata Khan, Director of Innovation, Microcontrollers Business, Philips Semiconductors, San Jose, Calif., Clive Watts, CPU Product Manager, ARM Ltd., Cambridge, UK, EE Times
August 30, 2002 (11:59 a.m. EST)
When the semiconductor division of Philips undertook the ambitious task of designing a family of microcontrollers that would remove all the impediments to migration from 8- to 32-bit architectures, we knew that our primary objective was to eliminate the cost barriers while at the same time ensuring deterministic, real-time performance and adequate bandwidth in a package small enough to be deployed in a variety of small-footprint embedded control applications.
This objective required that we find ways to minimize real estate, memory requirements, power usage, development time, and manufacturing costs. To achieve all of these objectives was extremely challenging, but by bringing together several advanced technologies our designers were able to develop a family of high-performance, 32-bit, RISC-based standard embedded microcontrollers that we expect to deploy toward the end of the year.
To achieve our ultimate goal required three basic des ign decisions, all equally important. First, we knew that the single biggest cost reduction could be achieved by creating a standard, open architecture that would eliminate, for our customers, the duplication of design effort and expense required to create a new ASIC for every application and allow reuse of software and hardware components instead.
Second, further cost savings could be achieved by using a new process technology (CMOS18) to build a 32-bit RISC microcontroller on an SoC with sufficient on-chip flash memory to support the more demanding applications. On-chip Flash together with SRAM eliminates the need for external memory, much of the cost of a multi-chip solution, and many of the delays that affect a controllers ability to respond to real time events in a deterministic manner.
Our third major decision, was to use a synthesizable core the ARM7TDMI-S to allow optimum flexibility in placing components on the core.
Key to allowing us to achieve our goals was the use of an advanced CMOS technology -- the CMOS18 Embedded Flash process --- that would allow the integration of the core processor and all of the necessary peripheral control functions onto a single die.
The main benefit was to significantly reduce the size of the die. The core represents a 20-30 percent smaller die than what would have been realized in a 0.25-micron design, and offers a potential 25- to 30-percent reduction in manufacturing costs. And of course anytime you reduce geometry you get faster processing speeds, higher bandwidth, and lower power consumption.
This size reduction was possible because the process technology we used allowed memory densities approaching one megabit per square millimeter of silicon and ultra-low power consumption. The combination reduced peripheral circuit complexity so that a complete 16-Mbit Flash memory can be implemented in less than 19 square millimeters of silicon. Because CMOS18 reduces the memory size by half over the 0.25-micron standa rd, Philips' engineers were able to offer customers Flash sizes of up to 256 Kbytes on-chip, which is a major factor in migrating them to 32-bit microcontrollers.
The use of a synthesizable core for a high-volume standard product goes against convention, especially in microcontroller applications. Traditionally, design teams use a full-custom solution for this market. However, from our point of view, hard cores have fixed layouts with the process technology, voltages, and other parameters predetermined, limiting design flexibility.
A synthesizable core allowed us to change the shape of the core to fit around other SoC components, such as memory blocks, to make the most efficient use of limited space on the die. The designer can optimize a layout based on the process technology, power/performance ratio, die size, and amount of memory desired, making tradeoffs to achieve the best combination for the market being targeted.
A synthesizable core also allows a ny design to be extended by adding application-specific IP blocks either from legacy designs or from third-party vendors. Software, device drivers, and development tools can be re-used. We chose the ARM7TDMI-S synthesizable core, in particular, not only because it gave us the low power and small die area we required, but because of advantages we saw in the ARM Thumb 16/32- bit instruction set, its new PrimeCell Vectored Interrupt Control, and the industry standard AMBA on-chip interconnect.
In traditional RISC processor designs when an interrupt from a device is received the CPU saves its current working registers and starts an ISR (interrupt service routine) to determine the source of the interrupt. It then transfers execution of the interrupt to a specific ISR designated to process an interrupt from that source. As there can be frequent interrupts from many sources, the cumulative time lost can be significant and can lead to many problems in a real-time system, where interrupts must be serviced wi thin specific time limits in order to preserve real-time operation of the system.
In a vectored interrupt controller, however, there is a list of vectors (code addresses) for ISRs that are associated with each interrupt source. When an interrupt is received, the VIC can pass the exact location of the associated code over the AMBA Advanced High-Performance Bus (AHB) to the processor immediately so that the processor can access the code without delay, thus preserving real-time operation.
The AMBA interconnect is a key element in our choice of the ARM7TDMI-S core because it enables reuse of IP and allows designers to easily extend and adapt the architecture to generate new derivatives using ARM PrimeCell peripherals and Philips IP blocks as well as a host of third-party IP solutions. We are able to reuse existing 8- and 16-bit peripherals by interfacing them to a low speed AMBA interconnect. This is key to cost-effective migration from 8- and 16-bit designs, since code developed by earlier user s of our previous generation microcontrollers for these peripherals does not have to be rewritten.
Our team found that by making use of both the VIC and the AMBA interconnect in a single architecture, we were able to integrate components tightly to achieve the determinism required in real-time systems.
In spite of the dramatic reduction in memory that can be achieved with 0.18-micron technology, the fact that 32-bit code requires much more storage space than 8- or 16-bit code still presents a challenge to embedded controller developers and users.
Because the ARM core's Thumb execution mode allows compression of 32-bit instructions into 16-bit operation codes (op codes), code density, while not as compact as a standard 16- bit controller would allow users of our design to achieve 30 percent more instructions than the uncompressed 32-bit ARM mode.
On execution, these 16-bit instructions are decompressed transparently to full 32-bit ARM instructions in real-time without performance loss. The designer can use either or both 16-bit Thumb and 32-bit ARM instruction sets for sub-routines, making tradeoffs of performance vs. code size according to the requirements for the application. The designer has the best of both worlds at 16- or even 8-bit system cost.
The use of the ARM7TDMI-S core in our design allowed us to take advantage of a number of power-saving techniques such as localized clock gating and selective stopping of the clock at a block level. Stopping the clock at block level gates off the clock for an entire functional block.