Programmable logic/customizable CPU cores adapt hardware to apps
By Chris Robinson, Embedded Applications Engineer, Altera Corp., San Jose, Calif., EE Times
August 30, 2002 (11:57 a.m. EST)
Embedded microcontrollers are the original SoC devices. In the last few years FPGA and PLD manufacturers have proffered the term "System- on- a Programmable Chip" (SOPC) which captures the idea of using programmable logic as a medium with which designers can create custom hardware including microcontrollers.
The elements required to create viable custom microcontrollers include: full-featured 32-bit RISC microprocessors, readily available peripherals as well as options for building custom peripherals, intellectual property, and comprehensive development tools. These elements are now available, and in combination with low-cost programmable logic devices (PLDs), designers can use these elements to realize attractive alternatives to off-the-shelf microcontrollers.
In particular, this approach is especially useful in compute-intensive applications where microprocessors and microcontrollers have moved into the realm of the DSP . In some applications, however, such as in small footprint iappliances and some embedded systems environments, such as home networking, where it is being used as the main connection to the external network, a subset of DSP-like functionality is all that is required. Usually in the past, the designer had to decide between options such as coding in assembly language and using a higher-clock frequency processor, or using a DSP processor as the main processor, or using a co-processor for the compute intensive portion of the problem.
With microprocessor/microcontroller systems in programmable logic, other options are available. The designer is now free to take specific software blocks and code these as hardware that can be called as single "custom instructions" from the microprocessor.
From the software engineer's perspective this is simply a function call in C or assembly language, but rather than a sequential list of instructions being called, a block of hardware is used to execute the rele vant algorithm. Similarly, the architecture of the system can be changed to match the problem at hand, including adding multiple arbitrated slave-peripherals on a multi-master bus, or including custom peripherals or DMA as necessary.
Many embedded and small footprint iappliance applications only require the use of floating point instructions. But when floating point calculations are done in software, they are notoriously slow. Few microcontrollers provide floating-point support, instead forcing the designer to use either a software library or an external floating point unit. In many cases, only a subset of the floating point instructions are necessary for the problem to be solved. The use of custom instructions allow the designer to choose which operations to implement as hardware and which to leave as software library routines.
To give a concrete example, a s ubset of floating point operations was created that included the instructions multiply, multiply with negate, absolute value, and negate. This subset floating point unit was built using 1019 logic elements in an Altera FPGA using a 16/32 bit embedded Nios microprocessor core. Each of the four instructions were part of a single unit and called with a separate prefix. The performance improvement over the software-only version of each of the four floating point instructions ranged from 15:1 to 150:1 speed increases, from 284 to 2874 clock cycles for software only implementations to less than 20 for the hardware implementation.
Aside from improving system performance by improving compute performance through custom instructions, programmable logic-based designs allow architectural changes in embedded microcontroller designs that can be used to improve overall system performance. For example, consider an embedded Web server on a microcontroller that on request from an external browser serves up several pages, including images that are 60Kbytes in size.
There are two ways to measure "throughput" on such an application. The first, "HTTP Server Throughput" is a measure of the end-to-end performance of the system from the request from the client to the final transmission and acknowledge. The second, "Transmission Throughput" is simply a measure of the time to transmit the http packets over the 10 Mbit/second Ethernet channel.
Optimizations to the system hardware using a programmable logic-based customizable processor includ adding a checksum peripheral that takes the burden of calculating the IP-packet checksum off the microprocessor and does this in parallel in a hardware peripheral added to the system. Another system optimization: Add a DMA controller to the system that is used to copy packets to/from the MAC/PHY chip from/to memory without having to go through the microprocessor, as is done in the baseline system.
Performance improvements can be achieved by moving software blocks into hardware or changing the architecture of the microcontroller system. All of our examples were done on the same development board running at 33MHz. Standard tricks of increasing the raw clock-frequency, or moving to faster memory, or adding cache to a system are still available to the system designer. The use of programmable logic microcontrollers extends the set of tools that a designer has available to improve system performance.
In the Web server application, performance improvements from doing checksum and DMA in software was almost three fold in the case of HTTP server throughput and in transmission throughput as well. In the first case, the boost was from a baseline software only approach of 2.12 Mbits/second to 6.07 Mbit/sec. In the case of transmission throughput, the improvement was 9.42 Mbit/sec from 3.14 for the ba seline software- only approach.
As with all tools, it is the skilled application of these tools that determine the speed with which performance targets can be met. The beauty of using programmable logic for microcontrollers is experimenting with new designs can be done in an iterative approach in a matter of a few minutes, blurring further the lines between software and hardware development.