SAN MATEO, Calif. The making of STMicroelectronics' ST210 very long instruction word (VLIW) processor is the story of how two companies with similar goals and different strengths came together to tackle some of the most harrowing problems in system-on-chip design.
As a chip-making powerhouse with a big stake in consumer electronics, STMicro had been looking for ways to boost the programmability of embedded chips used in consumer applications since the mid-90s. That led the company to think seriously about VLIW, which was becoming a growing buzzword in the microprocessor realm.
"At the time, there was a lot of interest in VLIW as a technique to increase processing power for a given size by way of increasing the software tool complexity," said Richard Bramley, R&D manager for STMicroelectronics.
Since then, the appeal of VLIW has not diminished. "Now we find that we really do need something like this in our products. One of the b iggest handicaps is providing enough programmability in consumer devices," he said.
While STMicro started investigating VLIW, Hewlett-Packard's R&D division, HP Labs, was working on its own VLIW technology in several forms. One of those was the LX project started in 1994 by Josh Fisher and Paolo Faraboschi of HP Labs in Cambridge, Mass. The HP team, however, was uncertain about how it was going to get the technology to market.
"We came in with a technology that we had been developing for some time and the aspects were reasonably far along, but we didn't have anything above the instruction set," said HP fellow Josh Fisher, who coined the term VLIW in the 1980s.
In 1998, after HP and STMicro had worked together in several other areas, they started looking at ways to cooperate on VLIW. By 1999, the relationship was formalized and the two set out to deliver a VLIW platform applicable to system-on-chip devices.
STMicro's job was to take the instruction se t and compiler technology from HP Labs and craft a microarchitecture that would be true to the spirit of VLIW. To do that, the company focused its design activities at a central location in Cambridge, Mass., and drew from five other design teams in Europe and the United States.
The task was not trivial. To keep the microarchitecture from bogging down the compiler, STMicro would have to refrain from using hardware-assist features, something other companies espousing VLIW had used to meet performance requirements of certain target applications.
But the STMicro/HP Labs team did not want to use this as a crutch because doing so would hamper the compiler's ability to schedule code and result in more conservative performance estimates, said Faraboschi, HP Labs' LX project manager and principal research scientist.
Working closely with HP Labs, STMicro crafted the microarchitecture so that it would be malleable enough to let a designer add or subtract basic architectural functional units like adders, mul tipliers, registers and register ports. In this way, a designer can analyze the three key parameters to any microprocessor performance, power dissipation and die area before the final architecture is frozen.
That in turn makes it easier to tune the microarchitecture to meet different performance benchmarks for different applications. The compiler is then free to schedule code for the resulting microarchitecture and extract instruction-level parallelism from applications written in C code. This approach is said to set the ST210 apart from other VLIW machines on the market. "We don't see ourselves competing head-to-head with Trimedia or things like that," Bramley said. "One observation that can be made is that ours is a lot simpler and has a lot less application-specific instructions in it. We clock it very fast rather than put in dedicated operations for certain tasks."
In this sense, the ST210 is a much purer form of VLIW as envisioned by the creators of the instruction set and compiler technology. "It's easy to get performance by putting in special doodads, but there are costs to that approach," Fisher said. "What this technology offers is the ability to cleanly scale the number of functional units and the structure of the processor. You then have the strength of the compiler to get tremendous performance out of it."
STMicro claims the four-issue ST210, which is being demonstrated as a test chip called the ST200-STB1, has the equivalent performance of a 1-GHz RISC device while maintaining the low-power benefits of a 250-MHz clock frequency.
Programmable in C
This kind of performance is comparable to a digital signal processor but with the advantage of being able to program it in C. "We did an internal comparison between DSP and this technology, and we found that they were probably equivalent, though the ST200 [test chip] had a slight edge in MPEG decode. The difference is that one is programmed in assembler, and one in C," Bramley said.
This is because DSP archi tectures usually aren't compiler-friendly. "DSPs and compilers traditionally don't like each other very much," Faraboschi said. "The way they express performance is through handcrafting assembler files. Here, we're talking about close to 100 percent performance at the C level."
Bramley said the VLIW engine is not an end in itself, but will become an integral part of system-on-chip devices for applications that now rely heavily on hardwired circuitry. A typical SoC that the company builds can contain as many as 50 discrete blocks of intellectual property, which creates a huge burden on hardware design. "We're not building products around VLIW," Bramley said. "We're adding VLIW to products where we have a leadership