DALLAS Taking a divide-and-conquer strategy in digital signal processing, Texas Instruments Inc. today announced two DSP cores one which it claims hits a new high watermark in raw performance, and another that stakes out a fresh ground in low power processing. The twin cores are aggressive optimizations of TI's existing C6X and C54X parts and are aimed at leapfrogging competition from two separate DSP joint development efforts, which have paired Motorola Inc. with Lucent Technologies, and Analog Devices Inc. with Intel Corp.
TI said its new high-end C64X core which is aimed at applications such as cellular basestations and central office gear will ultimately hit clock rates as high as 1.1 GHz and produce up to 9,000 Mips, while its low-power C55X which is targeted at cellular phones and other mobile systems will deliver up to 600 Mips while consuming just 0.05 milliwatts per Mips.
"This is a huge leap in what anyone has in DSP silicon or even as a paper tiger," said Mike Hames, a TI vice president and manager of the company's DSP business. "When you are number one, people are always taking digs at you. Our big focus is to divide and conquer."
Sizing up the competition
The new DSPs sport double the number of multiplier units as earlier TI DSP cores, larger caches and modest extensions to the instruction set architecture aimed at speeding imaging and other tasks. With the dual core launch, TI addresses many, though not all, of the issues raised by TI's closest competitor, the StarCore DSP core co-developed by Motorola and Lucent.
Hames said the C55X will sport significantly lower power consumption and 25 percent less code density than StarCore DSPs. While the StarCore parts will best the C55X in performance, the C55X will have more than adequate punch for the mobile applications it targets, such as third-generation cellular handsets, Hames said.
The C64X DSPs could offer more than twice the performance of the StarCore parts in some applications. However, "StarCore will still have an advantage [over the C64X] in terms of code density," he said.
Code density the raw amount of software required to process high-level applications into complied code for the DSP is one metric raised to prominence by Lucent and Motorola with their StarCore design. Despite this continuing issue, Hames said TI will still have a key software advantage over StarCore because its parts are backward-compatible with the existing C62X and C54X parts, so designers will be able to migrate existing code and applications to the new parts. For their part, the StarCore partners face the challenge of getting developers to write fresh code for a brand new architecture.
Analyst Will Strauss who tracks the DSP market for Forward Concepts (Tempe, Ariz.), gave TI high marks for the dual core launches. "It will take three years before the competition can catch up with TI," he said.
Defending the DSP tu rf is critical for TI because president Tom Engibous has carved out the fast-growing DSP space as the company's core competency. TI leads the $4.39 billion worldwide DSP market with a 48 percent stake, according to Strauss. The company saw its DSP revenue grow by 27.6 percent from $1.65 billion in 1998 to $2.1 billion in 1999, he said. The digital wireless market, especially shipments to cellular handset makers Nokia and Ericsson, accounted for much of this growth, said Strauss.
Strauss said the DSP business of Analog Devices Inc. (ADI) grew about 42 percent between 1998 and 1999, or somewhat faster than the general-purpose DSP market, which grew at 25.5 percent over that period. ADI increased its market share from 9.1 to 10.3 percent in that period.
Though they also increased revenue, both Motorola and Lucent Technologies lost market share in that span. "Lucent had a tough time in the modem market and Motorola simply has not been very aggressive outside of its internal consumption," said Strauss.
Despite the market lag, Motorola and Lucent helped set the technology agenda when they launched the StarCore products, which held the high-performance DSP crown prior to TI's latest announcements.
TI's counterattack on the performance front with the C64X is focused in part on increased parallelism, along with faster clocking. The core will support dual 16-bit or quad 8-bit instructions. There is a 2x increase in the number of register files, which not only improves C compiler efficiency but also reduces code size by as much as 25 percent, according to TI. And the device architecture now includes 10 new instructions, aimed at accelerating applications such as machine vision, imaging and audio. Some of the most frequently-used instructions are hardwired in the execution unit.
TI believes the C64X outperforms even hardwired ASICs and FPGAs on FFTs, Viterbi detection and Reed Solomon error correction code. At 1.1 GHz, the device will produce 8,800 Mips, or four times as many as the existing C62X ru nning at 300 MHz. It also produces 4,400 16-bit million multiply-accumulate operations (eight times as many as the C6X), and 8,800 8-bit MMACs (16 times as many as the C6X). Its special-purpose instructions will provide up to an 8x improvement in communications application performance, and a 15x improvement in imaging and video applications, said Henry Wiechman, C6000 strategic marketing manager for TI.
For example, the existing C62X parts can handle four digital subscriber line (DSL) channels while the new C64X running code optimized for the new instructions can handle 32 DSL channels and have headroom left over to handle voice-over-IP functions. The C64X is also supported by "ExpressDSP," the network of DSP APIs, operating system enhancements, and software library subroutines that TI has cultivated in conjunction with third-party suppliers.
The C64X will use the same basic very long instruction word (VLIW) core of the C62X, which can issue up to eight i nstructions per clock cycle. The C64X will hit the streets this summer in two versions a native 1.5-volt family in speed grades ranging from 600 MHz to 800 MHz, and a 1-V family in speed grades from 300 MHz to 400 MHz. The parts will initially be fabricated in 0.15-micron CMOS using copper and aluminum interconnects.
The C55X core also carves up new territory in terms of performance, but with an eye toward conserving power consumption. With 600 Mips and a power consumption on the order of 0.05 mW per Mips, the device is intended to support new handheld portable applications, such as 3G handsets, Internet audio players and digital cameras.
Though it is a dual-MAC architecture, the extreme low power consumption is a consequence in part of aggressive power management shut-down techniques used in portable and handheld systems. "Advanced power management is layered throughout the device," said Mark Mattson, C5000 product marketing manager. The core sports what Mattson called "increased idle domain s" that allow users to turn off one or all of the C55X's functional units in any of 64 different combinations.
Though it is code compatible with previous-generation C54X processors, the C55X borrows a page from Infineon Technologies' Carmel processors in allowing variable-length instructions. Instruction words for the C55x can vary between 8 and 48 bits. This can reduce clock cycles and promotes much higher control code density, as well as improved cell phone battery life. "We analyzed 54X code for branches, and used the C55X's instruction buffer to handle alignments between instructions that could be parallelized," said TI Fellow and DSP architect Ray Simar. Thus, more work can be accomplished each cycle.
The first standard products using the C55X core will be announced this spring. One family will use a native 1.5-V core and offer speeds of about 200 MHz to deliver about 400-Mips performance. A 0.9-V family will deliver about 140 Mips at speeds of 70 MHz. Custom versio ns of the C55X cores are already being designed into prototype cellular handsets, Hames said.
The DSP industry and TI's competitors have been squirming in anticipation of TI's announcement, alternately complimenting the company on its vision but questioning its ability to execute on its promises. Analog Devices' 16-bit DSP product manager Jerry McGuire questioned the importance of the Mips rating TI uses to tout the performance of its VLIW processors like the C6X, when TI's competitors like StarCore and ADI were embracing MMACs as a more correct DSP performance indicator. Even before TI made today's announcement, McGuire expressed cynicism over TI's ability to change current market trends. "We heard before that they were going to change the world, but we haven't seen it yet," he said.
But McGuire said he was grateful for the attention TI has drawn to the DSP industry. "That's really good for us," he said. Analog Devices' DSP revenues for fiscal 1999 grew 71 percent, or nearly three times the market's growth rate, McGuire said. "We're growing faster than the market because of our early investment in emerging applications and the value we add, which frequently includes analog and mixed-signal products as well as application software," he said.
Rush to parallelism
Kevin Kloker, StarCore's architecture director and Technology Center deputy director, said there is "a land rush to parallelism" in the DSP world. "That is the one direction to push if you're going to get performance," he said. The future of DSP and of computing will be dominated by several forms of parallelism, including instruction-level parallelism, data-level parallelism and task-level parallelism, Kloker said. Both the StarCore SC140 and TI's VLIW cores embody parallel resources and high levels of instruction-level parallelism. The StarCore SC140 core can obtain up to 1,200 MMACs and 3,000 RISC Mips with the 300-MHz clock, he said.
But parallelism is not a panacea for all computer bottlenecks, Kloker said. A parallel e xecution unit can be extremely costly to other parts of a system, such as memory bandwidth, he said. "If you are in a loop [as with IIR filters], you don't care about latency," he said. "But if you are executing control code, you want to branch rapidly and then latency matters." A well-balanced architecture is needed to keep bottlenecks from moving around, he said. Something like "real-time" performance, in which there are no unnecessary variations in timing, may be most ideal. "While the future of DSP is the parallel direction, efficiency is more important than peak numbers especially in a complex software environment where you need to perform multiple asynchronous tasks," Kloker said.
Details of TI's announcement can be found on a Web broadcast available online.
With additional reporting by Rick Merritt.