Update: Cadence Completes Acquisition of Tensilica (Apr 24, 2013)
Industry’s Highest Performance Core Will Replace RTL in SOC Designs
SANTA CLARA, Calif. – May 18, 2004 – Tensilica, Inc. today unveiled its next-generation Xtensa® LX configurable processor, the highest performance processor core on the market, featuring both higher computational throughput and dramatically higher I/O (input/output) bandwidth. This record-breaking performance, combined with Tensilica’s patented automated design and development environment, makes Xtensa LX the only processor fast and flexible enough to replace register transfer logic (RTL) design methodologies in system-on-chip (SOC) designs, leading to reduced development time and risk along with dramatic increases in ROI (return on investment) for semiconductor and systems companies. Xtensa LX is also ideally suited as a traditional control processor in embedded applications. Tensilica expects that most of its customers will use multiple Xtensa LX cores in each SOC design, each tailored to speed a different part of the customer’s application.
“With chip development costs now surging past $10 million, SOC development teams need to reduce project development time, risk and cost,” said Chris Rowen, president and CEO of Tensilica. “With the Xtensa LX processor, designers can configure optimized processors specifically tuned to their application in a fraction of the time that it takes to design and verify RTL, with comparable computational and I/O performance. The inherent programmability of the processor gives designers the flexibility to fix bugs and add features purely in software at any point – late in the design cycle or long after first shipment. This is impossible with hard-coded RTL.”
The Xtensa LX processor core features significant innovations in four key areas:
- Lower power, a key requirement for all SOC designs;
- Improved I/O throughput, so the processor can move data in and out at terabit/second speeds;
- Improved compute performance, so the processor can process complex algorithms much faster; and
- Better interfaces for on-chip memories, so the processor isn’t slowed down by memory access speeds.
Tensilica supports these technical innovations with a patented development environment that automatically and simultaneously generates an optimized hardware implementation, a corresponding tailored software tool chain, and a complete set of EDA models and scripts. Configuration and extension choices made by the designer to address requirements for a given application are immediately and automatically reflected in the entire software tool chain. With alternative approaches, this is typically a manual, error-prone task that requires extensive verification.
Lower Power Consumption
Tensilica has automated the insertion of fine-grain clock gating for every functional element of the Xtensa LX processor including functions conceived of and created by the designer. Clock gating is a very effective power reduction technique that turns shuts down the power to parts of the logic that are not in use on a particular clock cycle. Because automatic insertion of clock gating is only available for restricted RTL design coding styles, manual, error-prone post-layout tuning of clock circuits is often required for standard RTL design.
The Xtensa LX processor’s new architecture dramatically lowers power consumption in large configurations with many designer-defined functions. But even without designer modification, the Xtensa LX processor is designed to use power very efficiently. The minimum configuration of the Xtensa LX processor dissipates a miserly 0.05 mW/MHz in a representative 130 nm process technology. By comparison, the smallest member of the ARM synthesizable processor family, the ARM7TDMI-S, burns 0.11 mW/MHz in 130 nm technology – twice the power consumption of the Xtensa LX.
I/O Throughput Improved By Three Orders of Magnitude
Two major innovations improve I/O throughput in Xtensa LX processors: an option for a second load/store unit and designer-defined ports and queues.
Designers using the Xtensa LX processor can choose one or two 128-bit wide load/store units. Most standard embedded processors have only a single narrow (32- or 64-bit) load/store unit. However, many applications benefit from two load/store units for data-intensive inner loops – a standard feature of many high-end DSP processors. The Xtensa LX processor’s optional second load/store unit provides greater sustained general-purpose I/O bandwidth and an XY-style memory access for DSP applications. Additionally, at 128 bits, it’s much wider and can accommodate much more data than standard load/store units.
The true breakthrough in I/O is the capability to add designer-defined ports and queues, which allow the Xtensa LX processor to communicate as fast and as flexibly as RTL blocks. Ports are wires that directly connect two Xtensa LX processors or an Xtensa LX processor to external RTL. Port connections can be arbitrarily wide, allowing wide data types to be transferred easily without the need for multiple load/store operations. As many as one million signals (1024 1024-bit-wide ports) can be used, and while this is an outrageous number, far exceeding the performance demands of real systems today (providing 350 terabits/sec of direct data flow per processor in a 130 nm CMOS process), this clearly demonstrates that old notions of the I/O bottlenecks inherent in a processor-based solution are now obsolete.
While ports are ideal to quickly convey control and status information, queues provide a high-speed mechanism to transfer streaming data. Input queues and output queues operate to the programmer’s viewpoint like traditional processor registers – with the notable exception that data is always available without the need to load or store the data before and after computation. Queues can sustain data rates as high as one transfer every clock cycle or over 350 Gbits/sec for each queue added to an Xtensa LX processor. Custom instructions can perform multiple queue operations per cycle, perhaps combining inputs from two input queues with local data and sending the computed values to two output queues. The high bandwidth and low control overhead of queues allows the Xtensa LX processor to be used in applications with extreme data rates.
Ports and queues specified by the designer are automatically added to the Xtensa LX processor and are 100% fully modeled by Tensilica’s Xtensa Processor Generator. The full behavior of the port or queue, just like any other modification made to the Xtensa LX processor, is automatically reflected in the custom software development tools, instruction set simulator, bus functional model and EDA scripts – within about an hour. And because it’s automated using Tensilica’s patented technology, it’s pre-verified and correct by construction – no need to re-verify the processor.
Improved Compute Performance
Tensilica improved compute performance in the Xtensa LX processor through its innovative FLIX (Flexible Length Instruction Xtensions) architecture. The FLIX architecture is a highly efficient implementation of the Xtensa instruction set architecture (ISA) that gives designers more options for cost/performance tradeoffs. The FLIX technology provides the flexibility to freely and modelessly intermix instructions of various lengths (16-, 24-, or 32-/64-bit). By packing multiple operations into a wide 32- or 64-bit instruction word, FLIX technology allows designers to accelerate a broader class of “hot spots” in embedded applications. FLIX eliminates the performance and code-size drawbacks that can occur when using a one-size-fits-all instruction length. Compared to rigid, high-performance processor designs that either encode only one RISC operation per instruction or use ultra-wide 64b/128b/256b VLIW (very long instruction word) formats, FLIX delivers high-performance concurrent execution exactly and only when needed, yet preserves the industry leading code density advantages of the Xtensa processor’s native 16b/24b base architecture instruction formats.
Better Interfaces to On-Chip Memories
To address the growing speed disparity between standard cell logic and memories (memory access speeds have not scaled as well as logic in the migration from 180 nm to 130 nm and now 90 nm), the Xtensa LX processor features a configurable pipeline. Designers can select two additional clock cycles for memory access if required by the application. While Tensilica’s traditional 5-stage pipeline is very efficient for many applications, designers employing very large local memories or low-power memories with slower access speeds will find advantages in moving to a longer pipeline, resulting in a higher system clock frequency.
Leading Benchmark Scores
In addition to being the ideal alternative methodology for hardware block design, the Xtensa LX processor excels at traditional CPU and DSP tasks in embedded SOCs as demonstrated by industry leading benchmark results on the EEMBC (Embedded Microprocessor Benchmark Consortium) Consumer benchmark suite and the BDTI Benchmarks by Berkeley Design Technology, Inc. (BDTI).
The EEMBC Consumer benchmark “out of the box” score was 171.6 @ 330 MHz (0.51997 per MHz), nearly a 9X performance advantage over the ARM1020E. See separate press release issued today titled, “Tensilica’s Xtensa LX Processor Beats All Other 32- and 64-bit Processor Cores on EEMBC Consumer “Out of the Box” Scores.”
The Xtensa LX BDTIsimMark2000 score of 6150 for a 370 MHz configuration is 70% faster than the score for the next-fastest licensable core benchmarked by BDTI, the CEVA-X1620.* See separate press release issued today titled, “Tensilica’s New Xtensa LX Processor Earns Top BDTIsimMark2000™ Score.”
The base Xtensa LX processor consumes approximately 27,500 gates when synthesized for minimum power and area, and achieves 350 MHz (worst case conditions) in TSMC’s 130 nm LV process technology when optimized for speed. In 90nm technology, the 7-stage version of Xtensa can achieve over 500 MHz.
Pricing and Availability
Tensilica’s pricing structure is based on a licensing fee per processor instance plus royalties based on the volume of processors manufactured. Each licensed processor instance can be targeted to any silicon foundry technology. Licensing fees for a single processor configuration start at $550,000 for the Xtensa LX processor including the Vectra LX DSP engine. The Xtensa Software Developers Toolkit, which includes the Xtensa Xplorer development environment, Xtensa C/C++ compiler, and Xtensa Instruction Set Simulator; and TIE Compiler are priced separately. Customers can begin to take advantage of the new features of the Xtensa LX processor early this summer.
Xtensa LX is an addition to the Tensilica processor family, which includes the proven Xtensa V configurable processor. Customers will be able to continue to license the Xtensa V processor. The Xtensa V processor and the Xtensa LX processor both implement the common core Xtensa instruction set.
Tensilica was founded in July 1997 to address the growing need for optimized, application-specific microprocessors for high-volume embedded applications. With the Xtensa and Xtensa LX configurable and extensible microprocessor cores, Tensilica is the only company that has automated and patented the time-consuming process of generating a customized microprocessor core along with a complete software-development tool environment, producing new configurations in a matter of hours. These customized processors rival hand-coded RTL in performance and add a needed level of programmability. For more information, visit www.tensilica.com.
* The BDTIsimMark2000™ provides a summary measure of DSP speed. For more information and scores see www.BDTI.com. Scores © 2004 BDTI. The
Xtensa LX score includes use of 12 custom TIE instructions that expand the area of the core by 16%. Licensees may require greater or lesser degrees of customization. The scores for all other cores assume that no coprocessors or other customizations were used. The scores for the Xtensa LX and all other cores are for worst case operating conditions in a commercially available 130 nm process. Contact info@BDTI.com for more information.
# # #
- Tensilica and Xtensa are registered trademarks belonging to Tensilica Inc.
- BDTI Benchmarks and BDTIsimMark2000 are trademarks of Berkeley Design Technology, Inc.
- Tensilica’s announced licensees include Agilent, ALPS, AMCC (JNI Corporation), Astute Networks, Avision, Bay Microsystems, Berkeley Wireless Research Center, Broadcom, Cisco Systems, Conexant Systems, Cypress, Crimson Microsystems, ETRI, FUJIFILM Microdevices, Fujitsu Ltd., Hudson Soft, Hughes Network Systems, Ikanos Communications, LG Electronics, Marvell, MediaWorks, NEC Laboratories America, NEC Corporation, Nippon Telephone and Telegraph (NTT), Olympus Optical Co. Ltd., S2io, Solid State Systems, Sony, STMicroelectronics, TranSwitch Corporation, and Victor Company of Japan (JVC)