Tensilica[tm] Unveils Feature-Rich Third Generation Xtensa[tm] Configurable Processor Technology

Tensilica[tm] Unveils Feature-Rich Third Generation Xtensa[tm] Configurable Processor Technology

New Add-on Options Bring Configurable DSP Technology to System Designers

Santa Jose, Cal., June 14, 2000 ... At the Embedded Processor Forum, Tensilica Inc., the Santa Clara-based provider of application-specific processor technology, announced major design improvements and new options to the company's unique Xtensa[tm] architecture and intellectual property suite. Xtensa III, the third generation of the company's breakthrough technology, includes more complete configurability in hardware and software, more powerful features within the Xtensa architecture, and seamless integration of new DSP, control and media processor capabilities into the system-on-chip (SOC) environment.

Included in the announcement were three powerful preconfigured coprocessor options including the Vectra[tm] DSP, Floating-Point Unit and a 32-bit multiplier. Rounding out the announcement was more complete automation for designer-defined coprocessors using an upgraded Tensilica Instruction Extension ("TIE") Compiler, and the automated configurability of system development environments and leading third party RTOSes.

Tensilica president and CEO, Chris Rowen, said "The advent of Xtensa III signals the 'coming of age' of configurable processors. We are introducing a suite of ever more powerful features into the Xtensa environment having capabilities that allow designers to span virtually all of the critical protocol, signal, image processing tasks in addition to the traditional control paths that 32-bit processors have been assigned in the embedded space. "The basis for the new coprocessors," Rowen continued, "is the powerful capability provided by the enhanced TIE compiler that allows the designer to build his own system-specific solutions."

Of the new preconfigured options introduced by the company, most significant is a powerful new DSP coprocessor dubbed "Vectra". In addition, the firm announced the availability of a floating point unit (FPU), and a 32 x 32-bit multiply option. With the exception of the Vectra DSP coprocessor, the other options are included in the basic Xtensa license fee.

New Options available with Xtensa III

Vectra DSP Co-processor

The Vectra DSP Option is optimized to handle high-performance digital signal processing applications using fixed-point arithmetic. As such, this option is ideal for communications, audio and imaging applications employing a highly efficient and easy to program vector architecture. Vectra provides high data throughput, low power dissipation and the best DSP performance per watt and per area of any of today's core processors for SOC embedded applications. It can be quickly configured for 8, 16 and 24-bit fixed-point applications .

With Vectra, designers now have - for the first time - a single core architecture that can be rapidly configured to satisfy all of the requirements of embedded processing: control, protocol, signal and image processing. Like all Xtensa options, Vectra is fully-supported by software development tools that include vectorizing compilers, assemblers, simulators, RTOSes and optimized libraries for popular DSP functions.

Vectra's principal features are:

Worst-case 0.18-micron performance: 200 MHz
Outstanding Power Efficiency: 0.8 mW/MHz at 0.18-micron, 1.8 V.; 40 µW/MMAC in 0.15-micron, 0.9 V
Minimum Die Footprint: 2.9 - 3.5 sq. mm including the Xtensa base processor in 0.18-micron
Powerful high bandwidth, low power vector register combination:

16 extended precision vector registers (64 40-bit accumulators) 16 16-bit scalar registers, 4 128-bit alignment registers
128-bit paths between vector memory and resident RAM

Complete array of Tensilica and third-party software tools including compilers, debuggers, cycle-accurate simulators and CPLD-based emulation kits
Vectorizing compiler that enables full performance from scalar C/C++
Optimized FFT, FIR and Viterbi libraries
Full support for popular RTOS environments including WindRiver Tornado and ATI Nucleus Plus

Tensilica chose a Single Instruction Multiple Data (SIMD) approach for the Vectra DSP coprocessor. This architecture results in a boost in bandwidth and reduced power requirements by moving all critical data closer to where it is needed, the CPU. The vector register memory supports over 32-bytes per cycle source operand bandwidth and 16 bytes per cycle for local memory to vector file transfers. Ample local registers mean higher code efficiency (Vectra supports radix-4 FFT with 50% fewer loads and stores, 15% fewer ops than radix-2 FFT.) Vectra's overall performance is better than 2-7x the performance level of the typical "dual MAC" DSP.

Vectra Performance Summary:

4 multiply-adds or 8 adds per cycle
4 FIR taps per cycle
Viterbi butterfly in 2 cycles - GSM Viterbi state metric update in 5,180 cycles
High-pass vocoder filter achieves 8 cycles/point
256-point complex FFT in 2563 cycles, 1024-point complex FFT in 11,873 cycles
Vector register memory sustains > 6.4 GB per cycle source operand bandwidth

The first implementation of the Vectra DSP Option will be available at the end of 3Q00, with the fully scalable and configurable version becoming available in 1Q01. The license fee for use of this option within a core instantiation is $150,000.

Floating Point Unit

The Xtensa III release includes a 32-bit single precision floating point coprocessor option optimized for printing, graphics and audio applications. The principal objective guiding the design of this coprocessor was to provide the programming ease of floating point at the cost of fixed point processing. It adds the logic and architectural components needed for IEEE 754 single-precision floating-point operations.

Major Features:

16 dedicated floating point registers
Full set of load/stores, offset and indexed address update modes
Fully pipelined arithmetic operations in hardware:

add, sub, mul, madd, msub 4-cycle latency
loads and converts: 2-cycle latency
moves, compares 1-cycle latency

Full compiler support C/C++ float

Performance (0.18-micron, 1.8v):

Adds 20-25K gates to base processor for a total of 1.2 -1.5 square mm total core area.
Sustains 2 FLOPs/cycle = 400 MFLOPs in 0.18-micron.

32 x 32-bit Multiply Option

Xtensa III adds a new MUL 32 option to the library of MAC 16 (16x16 Multiply with 40-bit accumulator) and MUL 16 options previously released with Xtensa I. This option provides two instructions that perform 32x32 multiplication, producing a 64-bit result. While requiring more area than the MUL 16 option, at 0.18-micron technology, the added cost becomes trivial in view of the significantly greater precision provided.

Enhancements to Xtensa's Basic Processor and Tools

Besides the powerful new optional functional units available in Xtensa III, numerous enhancements have been made to the core processor and tool set. Some of the more important are:

Core Enhancements

Instruction and data RAM and ROM now can co-exist with I&D caches. The RAM and ROM options provide internal memories that are part of the processor's address space, and accessed with the same timing as cache. There are two RAM options: Instruction RAM and Data RAM, and there are two ROM options: instruction ROM and Data ROM.

TIE Enhancements

Up to four-cycle pipelined instruction capability. New TIE features include the ability to generate new instructions that are relaxed to fit into up to 4 clock cycles for the E stage of the pipeline. By using the "schedule" directive, the TIE extension is automatically pipelined, and the pipeline control and decoding logic is automatically generated. Extended instructions that execute in multiple clocks remain fully pipelined so that an instruction is issued on every clock. Stall cycles are automatically generated if a dependent operation is issued. The processor remains an in-order single-issue machine with instructions completing in order, one at a time.

Designer-defined register files: A new TIE definition enables multiple designer-defined special register files. A register file can be of any width (bits per register), restricted only by the number of bits allocated in the instruction. The TIE Compiler automatically generates load and store instructions using the special register files. Since it handles register allocation for designer-defined registers, there is no need for software written in assembly language.

Wide load/store operations with address update: TIE language allows for 32/64/128 bit wide load/store operations for efficient memory bandwidth utilization.

Register file ID: Designers can specify up to 8 coprocessor IDs for the set of states associated with each coprocessor. By associating a coprocessor ID with each register file, "lazy" save and restore operations become possible and are utilized for easy and fast context switching.

Software Enhancements

Enhanced compiler support: Xtensa's software development environment is fully integrated with the processor configuration system, supporting ANSI C and C++ code with configuration-specific language extensions. The compiler now allows the user to add configurable types to support easy programmability and automatic register allocation of user-defined coprocessors and register files. Aggressive optimization includes constant propagation, common subexpression elimination, loop invariant code motion, loop unrolling, global data flow analysis, instruction scheduling, local and global register allocation and jump optimization.

Bernie Rosenthal, Tensilica's Vice President of Marketing and Business Development, said "With Xtensa III, we are significantly adding to the 'firepower' provided by our configurable architecture, the result of which will be the creation of a broad variety of SOCs for communication and consumer applications. We are particularly enthusiastic about the Vector Integer DSP option. When Vectra becomes fully configurable, incredible new flexibility in DSP operations will be available to designers, and this will continue to accelerate the transition of configurable processor cores from the province of early adopters into mainstream design practice."

Price and Availability

Tensilica offers customers two delivery options. The standard option provides a firm macro in Verilog or VHDL RTL, and supporting EDA tool scripts, test suite, placement guidelines and the customized software tool chain. The ruggedized option provides a hard macro in the form of a Verilog/VHDL netlist, GDSII using the target semiconductor vendor's cell library, a test suite and the software tool chain.

The company's pricing structure is based upon a licensing fee per instantiated design plus royalties based upon units manufactured. Licensing fees for an individually configured processor implementation and complete software tool environment start at $350,000. With the Vectra DSP Option, the single instance fee is $500,000.

About Tensilica

Tensilica was founded in July 1997 to address the fast-growing market for application-specific microprocessor cores and software development tools for high volume, embedded systems. Using the company's proprietary Xtensa[tm] Processor Generator, system-on-a-chip (SOC) designers can develop a processor subsystem hardware design and a complete software development tool environment tailored to their specific requirements in hours.

Tensilica's solutions provide a proven, easy-to-use, methodology that enables designers to achieve optimum application performance in minimum design time. The Company has over 90 engineers engaged in research, development, and customer support from its offices in Santa Clara, California, Waltham, Massachusetts, Princeton, N.J., Houston, Texas, Reading, U.K. and Yokohama, Japan.

Tensilica is headquartered in Santa Clara, California (95054) at 3255-6 Scott Boulevard, and can be reached at (408) 986-8000 or via www.tensilica.com on the World Wide Web.