Computing dons new suits as required
By Stephan Ohr, EE Times
September 23, 2003 (1:38 p.m. EST)
For advocates of reconfigurable computing, it can seem like a new religion: Supercomputer performance from a handheld, battery-powered portable. The world becomes dichotomized between believers and non-believers, and wouldn't it be nicer-at least in terms of third-party development tools-if everyone just believed.
The prophesy goes something like this: Compared with a general-purpose, software-programmable microprocessor, a hardwired, dedicated function processor will show higher performance-orders of magnitude higher-with much lower power consumption. No more clock cycles are wasted loading and interpreting instructions from ROM; there are no more parallel resources idling, as they wait for something to do.
But the ultimate application-specific IC, the hardwired device, cannot be used for anything but what it was hardwired to do. Wouldn't it be n ice, the Belief System suggests, if the hardwired device could magically transform itself from one hardwired device (one stone) to another, on the fly as required. One moment it is your W-CDMA cellphone, finding you the clearest channel on a busy urban street. Out in the country, it reverts back to GSM; entering an office complex, it becomes your IEEE802.11g media access controller, pulling 54 Mbits/second off your office server. Ideally, it's reconfigurable on the fly. There would be no messy software to load.
Most supporters of this vision-QuickSilver Technology (San Jose, Calif.), picoChip (Bath, U.K.), Morphics Technology Inc. (Campbell, Calif.)-are targeting wireless basestations and cellular handsets.
Troubled by the proliferation of multiple radio standards and digital baseband modulation techniques with 3G cellphones, these manufacturers are promoting basestation processors and IPs that can replace an assortment of DSPs and ASICs with integrated devices that can reconfigure themselves in the field. picoChip, for example, hopes to absorb chip rate and symbol rate processing tasks, even some of the call control functions, with a single reconfigurable device.
The software-defined radio is in fact a perfect target for reconfigurable computing, says enVia CEO (Los Gatos, Calif.) Mark Cummings, who serves as steering committee chair of the Software Defined Radio Forum. DSPs and microprocessors add programmability only to the lower-speed portions of the wireless signal processing chain, Cummings insists. ASICs integrate the high-frequency portions of the chain, but the RF transmitter and antenna portions are neither integrated nor programmable, he says. Consequently, a multistandard radio must use a complicated baseband processor in conjunction with multiple switched RF sections. EnVia, which helped to start Morphics Technology, envisions a future in which all parts of the RF chain are integrated and reconfigurable.
3G signal processing functions, like CDMA2000 system acquisitio n, rake finger signal rating and set maintenance, as computational-intensive tasks, according to QuickSilver Technology-one of the first to promote an adaptive computing machinery (ACM) model for the way these tasks should be performed. ACM uses transistors more efficiently than conventional ICs, the company claims. The results are higher performance, lower power consumption, smaller die area and lower cost.
QuickSilver's nodal architecture consists of transistor building blocks much smaller than DSP segments like ALUs or multiply-accumulate blocks. They are, in fact, smaller than the logic gates that make up a programmable logic device (PLD) or field-programmable gate array (FPGA). Configured to perform one function, then reconfigured to perform another, these building blocks are packed much tighter than logic of an FPGA. The tighter packing density results in higher performance and efficiency, the company claims.
A test chip demonstrated last year outdid hardwired ASICs in a CDMA system acq uisition-a task requiring the examination of 215 different phase offsets. Making 512 complex correlations on captured data streaming at 8x this chip rate (equivalent to eight parallel correlators running at real-time), the ASIC device completed its acquisition in 3.4 seconds. QuickSilver's test chip (a four-node device running at a leisurely 25 MHz) came in with 1 second. Another test chip (a 16-node device running at 100 MHz) did it in 0.06 second.
No one questioned that QuickSilver could obtain such performance. Rather, observers like DSP and FPGA industry analyst Jeff Bier of BDTI (Berkeley, Calif.), were quick to point out that such a fine-grained architecture would be extremely difficult to optimize without an array of well-tested programming tools.
Some manufacturers, like picoChip and Elixent Ltd. (Bristol, U.K.), are attempting to resolve the programmability issue by sacrificing granularity and utilizing a higher-level building block. picoChip's PC101 picoArray is effectively a massiv ely parallel heterogeneous DSP processor array. Each processor is roughly equivalent to ARM9 performance, and there are several hundred processors per chip, says Rodger Sykes, picoChip president and CEO. At 160 MHz, the PC101 has 19 times more processing power than Texas Instruments' C6415 DSP running at 600 MHz, Sykes claims.
PicoChip hopes to resolve the programming issue by providing a library of UMTS functions that can be mapped to picoChip arrays. The long list of functions includes a searcher, rake receiver, rake finger manager, channel estimator, receive frame processing, de-interleavers, Viterbi decoder and NBAP measurement device. picoChip has not commented on the silicon utilization issues that plague custom logic arrays, such as FPGAs and gate arrays.
While Elixent sees a market "sweet spot" in reconfigurable image processing (as an example, JPEG, MPEG encoding), it, too, believes that parallel processing is an appropriate model for reconfigurable computing. Kenn Lamb, Elixent CEO and founder, argues that conventional von Neumann processors are sequential processing machines that waste a lot of machine cycles cranking data through a register chain. The arithmetical "heavy lifting" is done in parallel by distributed processors.
Elixent's reconfigurable algorithm processor (RAP) array embodies an even higher level of granularity. The company's D-Fabrix architecture utilizes an array of 4-bit arithmetic logic units (ALUs), register and memory blocks that may be combined to support variable data word widths. Lamb identifies the array as an "extreme VLIW" processor, but without a fixed set of resources, like Texas Instruments' C6000 parallel DSP. Elixent's array (which resembles a chessboard), with 50 percent of the silicon area devoted to ALUs and 50 percent devoted to reconfiguration logic, is nonetheless denser than TI's '6000 device, according to Lamb.Configuration guide
For any particular application, the data flow path guides the hardware description of the con figuration (the HDL). The VLIW word effectively becomes the configuration guide for the array, Lamb says. And configuration can be accomplished with ANSI C-language primitives. But even the best data path architecture (like a Viterbi decoder) has branch-and-control instructions, Lamb reminds. "Do you need eight parallel 4-bit ALUs?" he asks rhetorically, "Or a 4-bit ALU with eight loop-back operations . . . What do you fancy doing for the next 100 microseconds?"
The trick is to find and extract the parallelism inherent in C-a good job for a third-party compiler developer. AccelChip Inc. (Schaumburg, Ill.) is partnering with Elixent to provide a configuration map based on MatLab functions. The array is integrated with Toshiba's own MeP configurable processor core and fabricated in 0.13-micron CMOS.
Elixent's research was initially funded in-house at Hewlett Packard. Initial research focused on FPGAs utilizing a programming mode (in which the configuration file for the logic is loaded) and a ru n-time mode (in which the processing functions are executed).
There are quite a number of high-end echoes of this remaining: Bob Brodersen, co-director of the Berkeley Wireless Research Center (BWRC) at the University of California, reported that he could obtain near-supercomputer performance with arrays of Xilinx Virtex FPGAs programmed with MatLab primitives. (The caveat was to keep everything in MatLab, Brodersen said.)
Meanwhile, another U.K. company, Nallatech Inc. (Glasgow, Scotland), fulfills high-performance military image processing applications using reconfigurable board-level computers also powered by Xilinx Vertex-II FPGAs. Nallatech, a spinoff of British Aerospace (BAE Systems), developed the reconfigurable computing platform as a way of avoiding the hardware redesign that seemed incumbent in every new military project.
Xilinx isn't the only programmable logic manufacturer to provide hardwired performance. Altera Corp. (San Jose) has long demonstrated the performance of D SP and communications functions hardwired into FPGAs. Through its acquisition of Hammercores, Altera is continually expanding the range of logic functions that it can perform. Power consumption and reconfigurability on the fly remain the problems, however.
"Dynamic reconfigurability issues are a software nightmare," said Bob Garrett, marketing manager for Altera's Neos platform. "We haven't heard anyone asking for that. It'll make engineers cringe and run away." The Stratix FPGA supports 80,000 logic elements for larger processing tasks, Garrett said, and Altera's "Hardcopy" design flow enables FPGAs to be rapidly converted to a more compact ASIC device.
"There are two types of reconfigurability," reminds David Fritz, vice president of technical marketing at ARC International: "static and dynamic." Examples of a statically reconfigurable device include FPGAs, he explained. Dynamically reconfigurable devices include logic nodes with SRAM configuration hints.
The level of granularity-st ate machine, a bit slice processor or a full processor-is a primary concern. The finer the level of granularity, the better the performance, but the more difficult the machine becomes to program. "No one has found the right level," Fritz says. He admires the QuickSilver Technology model in which "waves of data" become transformed to "waves of logic."
"It's an interesting idea," he says, "but how do you program a beast like that?" Hot spots
His argument leans toward the bit slice processor, which does not need to be as sophisticated as a QuickSilver fine-grained machine. He suggests finding the "hot spot" for any particular application and migrating that processing technique (in Verilog or some other HDL) to a microprocessor routine, an FPGA core or a hardwired device.
"ARC is not committed to making a reconfigurable device right now," Fritz says, "but we've done research." Current development efforts concentrate on SRAM-supported embedded processing nodes that serve digital con sumer applications, such as set-top box and DSL gateways, MPEG video and MP3 audio decoding.
Like Altera's Garrett, Fritz wonders whether reconfigurability itself may have limited utility. While a certain amount of exception handling is desirable, you always know what your application domain will require. Wouldn't it make sense to use a microprocessor to take care of the increasingly shorter list of uncertainties?
For that reason, Leopard Logic Inc. (Cupertino, Calif.) positions itself as the middle ground between FPGAs and custom ASICs. FPGAs offer design flexibility and rapid turnaround, but they tend to be bulky, power-hungry and expensive, said Stefan Tamme, marketing vice president for the three-year-old startup. On the other hand, ASICs- even structured ASICs-are encumbered by design complexity, long lead times and high NRE [non-recurring engineering] costs. ASICs are often the best way to implement a new communications interface-but not while the standard is still being tested, Tamme p ointed out.
Leopard Logic's solution (to be elaborated upon in early 2004) is to embed reconfigurable logic in an application-specific standard product (ASSP). The reconfigurable array would have a finer granularity than conventional FPGAs, and, thus, save silicon area, power and costs. It would be supported, though, by a similar cohort of programming and RTL development tools. While its performance would approximate that of a full ASIC device (certainly faster), its costs would make it most suitable for limited product runs-between, say, 5,000 and 500,000 pieces-the company said.
A general-purpose microprocessor, when all is said and done, is a complicated piece of machinery. It is analogous to switch master in a giant railroad yard, one with dozens of tracks and switches and many thousands of crossover combinations.
Each processor instruction tells the processor where to find its data and what it should do with that data when it finds it. In the railroad yard analogy, the instructio n opens an array of switches, and closes off others.
In silicon terms, a complex instruction set computing machine (called a CISC processor), like Intel's Pentium or Sun Microsystems' Ultrasparc, could waste a lot of cycles and real estate on instructions (parts of the switchyard) that are seldom utilized. Advocates of reduced instruction set computing (RISC) suggest, "Let's not overdo the complexity; why not use just what you need?"
But it is entirely possible to tailor the instruction set to even smaller dimensions than those of the ARM processor. This idea approximates the position of the ASIC group at Toshiba America Electronics Corp. (Milpitas, Calif.). It is possible, suggests ASSP business manager Farhad Mafie, to profile the software activity for the DSP operation that you're likely to conduct on a cellular phone or handheld portable.
Once you've decided that, says Mafie, you can build the hardware exactly to spec. What reconfigurability do you need, asks Mafie, if the ability to sort among operations, to pick and choose among available hardware resources, is built right into the controller?
See related chart