Most complex embedded digital systems use one or more ASICs to secure the high level of functionality that is required. Until recently, designers of such systems were forced to consider a new iteration of ASIC, or possibly even a complete chip redesign, whenever they needed to change the functionality of the system. That approach carries significant penalties in terms of development time and cost, which are untenable in today's highly competitive environment, where time-to-market is critical. Worse still, it results in yet another fixed-function system, with no flexibility for future upgrades.
The introduction of high-performance field-programmable gate arrays (FPGAs) and complex programmable logic devices has done much to improve this situation, enabling designers to reconfigure various system elements through adroit use of programming tools. Most devices in those categories, however, are still essentially standard products that are used as "glue l ogic" around an ASIC core; as such, they contribute significantly to the total component cost, and cannot really be regarded as part of a true embedded solution.
There are very few FPGAs on the market that are embeddable, i.e., where designers can purchase the intellectual-property rights and then incorporate the cell array into their own silicon. Those that are available tend to be low-complexity devices with very small gate counts, which are aimed specifically at bit-level logic processing rather than signal-processing applications.
Using technology originally developed by Hewlett-Packard's Research Laboratories for use as reconfigurable image processing for computer printers, our engineers developed a methodology for speeding up the incorporation of reconfigurable technology into embedded systems.
The key element is the use of a reconfigurable arithmetic logic unit array (RAA) of 4-bit ALUs, registers and embedded RAM, which are connected via switch boxes to a routing network. This provides l ong and short buses to support complex interconnection schemes. The ALU size is large enough to support a useful instruction set, and to facilitate efficient arithmetic operations for the 8- to 24-bit data lengths that are prevalent in most common multimedia applications.
The ALUs are positioned on the array in the style of a chessboard. This scheme facilitates extremely flexible interconnectivity, with each ALU having input and output buses on all four sides and able to send data to or receive data from any of its eight surrounding ALUs.
The ALUs can be programmed statically or dynamically, via 4-bit instructions. Networks of statically programmed ALUs can be configured into synchronous signal processing pipelines (similar to an ASIC in design style) that can keep hundreds or even thousands of ALUs busy on each cycle. This approach yields massive instruction-level parallelism, which can generate a peak performance of up to 400 million operations per second at 100 MHz (simple 16-bit operations) fo r a 16-ALU array.
By embedding a reconfigurable signal-processing fabric in a system-on-chip, designers gain considerable advantages. The technology's reconfigurable data paths provide an exceptional level of design flexibility, enabling designers to concentrate on application-level issues rather than base-level hardware, and to postpone commitment to final system implementation until after silicon fabrication.
A good illustration of the usefulness of this approach is in baseband processing in a cellular system, where there is never enough computing performance for operations other than basic signal processing. Consequently, designers are invariably forced to employ accelerators for computationally intensive tasks such as the Viterbi and Rake receiver functions. The processing load for third-generation (3G) applications is even higher since in addition to the normal voice and high-speed data channels, there are also the multimedia functions to worry about and the amount of silicon real estate that needs to be dedicated to accelerators rises accordingly.
The current industry view is that despite the cost and power overheads, most next-generation 3G terminals are likely to need two baseband devices. One would handle data recovery and encoding functions, such as channel equalization, modulation/demodulation, convolutional encoding and Viterbi and turbo coder functions. The other would handle multimedia-related functions demanded by the data flow itself, such as the compression, decompression and display of still images and video.
RAAs potentially offer an elegant way out of that dilemma. Their inherent data path reconfigurability means that the arithmetic-processing structure can be changed to suit the task in hand, enabling totally different sets of algorithms to be used at different times.
Currently, reconfiguring a product in real-time is an extremely difficult proposition; especially when it comes to computationally intensive applications such as datacoms and telecomms. The sheer amount of data coming through and being subjected to complex algorithms makes it very difficult to take that system offline, even for a second. Reliability is key and in order to eliminate costly redundancy issues, a different solution needs to be developed.
One such solution that the RAA approach facilitates is that of online reconfiguration. This is where the array is used in a round-robin fashion, with one algorithm (implemented on the array) following another as the system dictates.
This allows a massive level of silicon reuse as only one RAA is implementing multiple algorithms, and is made possible by the very rapid reconfiguration time on the order of tens of microseconds. Additionally, the presence of instruction input to the ALU allows adaptation at the sample rate without having to either take anything offline or change the interconnect.