TeraGen architecture primes single engine for multiple instruction sets
TeraGen architecture primes single engine for multiple instruction sets
By David Lammers, EE Times
January 25, 1999 (2:08 p.m. EST)
SUNNYVALE, Calif. Startup TeraGen Corp. today will disclose plans to upend the traditional microcontroller world with a novel architecture that can execute multiple instruction sets concurrently on a single processing engine.
The approach is a radical break with the past, and its success will depend in part on TeraGen's ability to manage the internal scheduling challenge of translating instructions from multiple instruction-set architectures (ISAs) on the fly and scheduling the translated operations for execution on a single engine.
In a technique reminiscent of that used in the Advanced Micro Devices K6 to break X86 instructions into sequences of RISC-like operations, the TeraGen architecture translates individual instructions from various instruction streams into sequences of VLIW-like operations called primitive operations (POPs). Those instructions, and the ability of the processor engine to schedule multiple instructions nea rly simultaneously, are part of what TeraGen cofounder Don Sollers calls "the secret sauce" of the TeraGen approach.
The key advantage of the architecture will be its ability to execute the code from several different processing cores-a RISC CPU, ancillary DSP core and peripheral microcontroller, for example-on one engine. The TeraGen engine can be adapted to an additional instruction set simply by adding a small block of fast ROM to govern the translation of the new instructions into POPs. Thus, a design team could condense several processing cores into a TeraGen core without altering existing code.
Further, a ROM could be set up to translate not an ISA but a set of hardware functions into POPs. The TeraGen engine could conceivably be configured to emulate peripheral functions as well as other processors.
In the TeraGen-based system, instruction streams for each of the emulate d processors, plus commands and data for emulated peripherals, would all flow into the TeraGen scheduler, where each stream would be translated into POPs. The resulting streams of VLIW-like operations would be scheduled for execution on the TeraGen execution unit.
A key feature of the unit is a large, fast data cache used for register emulation. By allocating cache locations to represent each of the registers for each of the instruction sets it is emulating, the TeraGen engine apparently can blend POPs from different streams of instructions into a single flow. It thus can theoretically find opportunities for parallelism that would escape a conventional superscalar or VLIW architecture.
Sollers said the approach uses "a virtual register file to emulate a register by using a very high-speed cache. This is part of our secret sauce: The cache can be accessed as a register."
In embedded, Sollers said, "the DSP, the MCU and peripheral circuits all need to work in parallel. We are sucking al l of that functionality onto a single chip, moving a lot of the hardware-based functions into software. We have the capability to manage and schedule operations within an RTOS. This approach would allow the control logic to run at the same speed as the data path, which is what real-world multiprocessing is all about."
Funneling the instruction streams for several devices into one engine could eliminate the need for separate processors and peripherals and thus reduce die and packaging costs. And performance could be improved, not only by exploiting opportunities for parallelism among the various instruction streams but by scaling the single TeraGen engine more quickly to a faster process.
A major opportunity lies in putting a DSP instruction set in one ROM and a complementary MCU instruction set in another, and letting the processor handle both instruction streams. For example, a DSP codec that does much of the work in a cellular phone could have its instruction translation implemented in one ROM, alongside another ROM storing the instruction translation needed to take care of the work normally done by a separate MCU: protocol engine, keypad and display-control tasks.
Chief executive officer George Alexy, recruited from Cirrus Logic Inc. in mid-1998 to head up TeraGen, said two semiconductor companies have taken licenses, initially for 8-bit applications. Those two partners will reach silicon within the year.
Analyst Will Strauss, principal at Forward Concepts (Tempe, Ariz.), said he believes TeraGen is on to something big. Legal questions aside, Strauss said a TeraGen processor could run an instruction set intended for a DSP developed at Texas Instruments, Lucent or Motorola. All of those designs use a Harvard architecture; TeraGen employs a unique register-file approach.
"I do believe this is a breakthrough," Strauss said, adding that he has discussed the approach with engineers from MultiFlow, which worked on thread architectures for several years.
The ability t o reuse code while combining a DSP and an MCU may be unique to TeraGen, Sollers said. The StarCore approach now being developed by Motorola and Lucent is working toward combining a DSP and an MCU on the same die, but Sollers claimed that the StarCore effort "will almost be forced to adopt a new ISA. In our approach, we allow people to use a familiar ISA. From a top-level perspective, what we are doing is allowing people to configure a system-on-chip through software. That is where the flexibility of this approach comes from."
Complex to primitive
Moreover, putting multiple processor cores on one die "is not a particularly cost-effective approach," Sollers said. Rather than dedicate ever-faster transistors to an established instruction set, TeraGen "breaks very complex tasks into primitives very quickly, to achieve an advantage that way. The POPs are long instructions-a native instruction set that is dramatically different from what previous architectures have attempted. How we hierarchical ly establish our instructions is our inherent advantage."
Initially, TeraGen's staff of about 20 engineers will work with the two licensed partners to create application-specific solutions. Though Alexy declined to detail the initial targets, he described how several chips in a set-top box might hypothetically be rolled into a single TeraGen implementation.
TeraGen expects "the first wave of interest to come from the flexibility possible in doing peripherals," Alexy said. "The major MCU vendors might have a few basic processor cores surrounded by 40 to 70 peripherals. To get the right variation on the peripherals takes time, slowing the response to the customer."
With the TeraGen approach, a customized solution to those peripheral needs could be stored in one or more ROM chip.
Analyst Strauss said TeraGen may quickly run into intellectual-property issues. Key to its appeal is the claim that software engineers will be able to port existing code quickly to the ISAs implemented i n ROM, saving the time and expense of rewriting code. Existing code could be made to run faster by speeding the processor engine.
Alexy said TeraGen will approach the IP thicket "on a case-by-case basis."
TeraGen's two initial licensees will use the approach for specific products aimed at specific applications, Alexy said. Next, TeraGen will apply its concepts to higher-performance problems, requiring 16- and 32-bit processor engines, he said.
TeraGen has attracted $9 million in investment capital from Sequoia Capital Partners and InterWest Partners.
Strauss said cofounder Sollers was a principal architect of the DSP architecture being brought to market by ZSP Corp., which uses conventional superscalar techniques to increase signal-processing throughput. Sollers earlier worked on processors at Digital and Sun and was principal architect of the Supersparc II.
Copyright © 2003 CMP Media, LLC | Privacy Statement