Flexible cores optimize architecture
Flexible cores optimize architecture
By Jim Turley, Vice President of Marketing, Arc Cores Ltd., San Jose, Calif., EE Times
August 14, 2000 (3:05 p.m. EST)
The growth of the Internet, and emergence of new applications such as 3G wireless communication and digital broadcasting are creating an exponential increase in the data processing requirements of many embedded applications. As a result, traditional microprocessors on their own can no longer meet the performance and power requirements of many new system-on-chip-based devices.
In the past the only practical option for meeting extreme performance requirements without compromising cost or power consumption goals was to design a custom hardware function using an ASIC. However, hard-wired solutions are not always appropriate because of the rapid evolution of standards occurring in today's communications and consumer markets.
This has led to a requirement for a solution that can offer greater performance than tradit ional microprocessor cores while retaining the flexibility of software programmability.
One technology that has emerged in response to this requirement is configurable microprocessor cores. Before detailing some attributes of configurable cores, it is necessary to distinguish between configurable and synthesizable cores. While configurable cores are also synthesizable, they provide a tool that enables easy customization of a processor and its associated test suites, without having to hand- modify hardware description language (HDL) code.
Much of the attention on configurable cores has been focused on the ability to extend the instruction set with application-specific instructions. While this is an important element in implementing high-performance programmable solutions, an overlooked advantage of configurable processors is the architectural flexibility they provide to designers. This flexibility enables designers to architect an optimal solution rather than simply throw Mips at a problem.
The microprocessor cores used in today's systems-on-chip (SoCs) were originally designed to be CPUs in desktop computers. As a result, they define a system bus architecture that is essentially the same as that used for board-level designs. The system bus is used to handle instruction and data traffic between the processor and main memory, peripheral I/O and DMA operations.
This approach has several drawbacks. Because multiple devices must compete for access to the bus, congestion on the bus reduces overall system performance since each device has to wait its turn. In addition, the bus timing and protocol specifications require complex glue logic to interface a peripheral device with the bus. This increases the area and power requirements of the SoC and lengthens design and test times.
A further complication for SoC designers is that many SoC applications use multiple processors, such as a microprocessor and a digital signal processor. Since each of these processors is optimized to work with its own bus architecture, the typical solution for interprocessor communication is to add bridges between the different buses. The combination of multiple buses and bridges often further saps system performance, increases the number of gates, and introduces additional bugs.
Configurable processors are, by nature, easily adapted to traditional system bus architectures. However, their configurable bus architectures provide designers with alternative approaches. For example, in the architecture we developed, up to four independent buses can be configured.
Similar to a typical Harvard architecture, it has separate instruction fetch and data (load/store) buses, which can be combined via an arbiter into a Von Neumann architecture, if that's desired.
In Arc Cores' architecture, the third optional bus performs an auxiliary function, which is to provide single-cycle access to the CPU's auxiliary register space. For any peripheral, the input, output and control registers can be added to th e auxiliary register map. Since this peripheral interface has no complex protocol or timing considerations, the processor and its peripherals can be integrated without extensive design work or the addition of many gates. Peripheral I/O is now handled on a separate bus and there is no longer contention between peripherals and the processor for access to the memory system or bus. With multiple buses, the processor can access peripheral data, fetch instructions and data all in the same cycle. In actual designs, the auxiliary bus has often been used to provide the processor with a direct high-speed interface to dedicated coprocessors or application-specific hardware that provide functions like pattern matching or on-the-fly encryption. A fourth optional bus provides direct access to a primary register space.
This is primarily intended for debugging purposes, but it has also been used with the auxiliary registers for interprocessor communication in multiprocessing configurations.
In many applicat ions it may be necessary to process multiple streams of data in parallel. Rather than multitasking these with a single fast, power-hungry processor, configurable processors provide an option of dedicating a single processor to each stream. An important advantage is that each processor configuration can be stripped down to include only what is necessary to process a particular data stream. This reduces power and area requirements, and enables more processor cores to be used on a single chip.
Integrating multiple cores into a SoC is also simpler with a configurable core. Since configurable cores can support both DSP and general-purpose processor functions, the design team needs to learn only one architecture and one set of development tools, even when the application requires several special-purpose cores.
An important design goal of any multiprocessor design is to minimize the communication overhead between the different processors. If this overhead becomes too high, it greatly reduces the imp act of adding more processors. Configurable processors are able to improve interprocessor communication efficiency in several ways. The common architecture provides a further advantage since there is no requirement for bridges between the different buses that may exist when multiple processor architectures are used in the SoC. In addition, each core can be configured to include the buses that provide direct access to the standard and auxiliary registers. This enables each core to see and access the registers of the other cores in the system, provide fast communication between each core and avoid cluttering the memory bus with additional traffic.
Once an application has been partitioned between different processing units, it will often be necessary to find ways to accelerate the operations performed by a specific processing unit. In addition to the traditional approaches of increasing clock speed or adding dedicated hardware, configurable processors enable algorith ms to be accelerated by using additional instructions. Essentially they provide the designer with the potential to get the best of ASIC, RISC, and CISC approaches, with a limited number of instructions that include a few powerful instructions that are specifically needed for the application being executed. This limits power consumption and preserves code density while providing a highly optimized, yet programmable, solution.
Instruction extensions can come from three different sources: the core vendor, third parties, and custom ones added by the design team. Configurable processors typically provide a "base" instruction set that includes little more than load/store and basic arithmetic operations. Additional capabilities, such as DSP capabilities, can be added from extension libraries specifically, saturating add/subtract, several MAC options and a barrel shifter.
Custom instruction extensions offer designers a particularly powerful way to accelerate application performance while retaining pr ogrammability. Consider the example of the DES encryption application. Rather than building it as a separate coprocessor, specialist bit-permutation and cipher instructions and additional registers to hold keys can be added to the core to enable accelerated execution of encryption operations. Custom instructions enable high-performance operation at much lower clock rates and therefore cut power consumption. In one design, an ARC user was able to reduce the clock rate from over 100 MHz to just 12 MHz by the addition of two custom instructions.
To provide a truly configurable instruction set, it is important that the number of clock cycles for an instruction extension be configurable. For example, if the architecture enforces a strict RISC paradigm where every instruction executes in a single cycle, it may be impossible to add powerful, complex multicycle instructions. Obviously, the processor pipeline must be able to support execution of variable cycle-length instr uctions. It should also let single-cycle operations proceed in parallel with long-latency ones.
Configurable processors let designers work smart rather than simply work fast. The configurable bus architecture and small processor footprint enable a programmable processor to be tightly integrated with custom hardware to reduce hardwiring. Further programmability can be achieved by adding custom instructions that accelerate execution of applications, enabling a totally programmable approach.
Copyright © 2003 CMP Media, LLC | Privacy Statement