Siroyan implements clustered DSP architecture

Siroyan implements clustered DSP architecture
By Peter Clarke, EE Times
October 15, 2001 (1:30 p.m. EST)
URL: http://www.eetimes.com/story/OEG20011015S0035

LONDON — After two years of burning venture capital and doing detailed design work, startup Siroyan Ltd. (Reading, England) is shifting gears as it brings its configurable, clustered very-long-instruction-word (VLIW) DSP architecture to market. The company claims its approach offers finer-grained and much greater scalability than existing DSP-plus-RISC solutions.

Ultimately, that will lead to higher and more efficient performance as contemporary architectures, weighed down by legacy considerations, run into the sand, the company claims.

Siroyan is also considering doing some work with legacy code by linking up with Transitive Technologies Ltd. (San Diego) and that company's Dynamite binary translation technology.

Nigel Topham, chief architect at Siroyan, is due to present the company's OneDSP architecture on Thursday (Oct. 18) at the Microproc essor Forum in San Jose, Calif. Topham will also detail the first implementation of OneDSP, a configurable fixed-point core previously known internally as Rubicon but now known simply as the SRA-328.

The company has taped out its first test chip, the two-cluster SRU322, in UMC's 0.15-micron CMOS process technology and expects it to achieve a 200-MHz clock frequency. Early in 2002, the SRA-328 should be able to ship as licensable intellectual property capable of peak performance of up to 12.8 billion operations per second or 3.2 billion multiply-accumulate operations per second. Siroyan's road map includes power-optimized and floating-point derivatives of OneDSP in 2003.

Fixed goal

The architecture has evolved in the two years since Siroyan's founding, but the ultimate goal has remained the same: to address designs currently based on a mix of microprocessors and DSPs. That includes just about everything from wireless LANs and 3 G infrastructure to handheld devices, automotive entertainment and networking equipment, said Topham, who was director of the Institute for Computing Systems Architecture (ICSA) at Scotland's Edinburgh University before joining Siroyan.

OneDSP integrates DSP and RISC technology using VLIW techniques operating on 32-bit data and using 32- or 64-bit data paths. The architecture supports up to 32 clusters each with two execution units. The SRA-328's naming convention indicates that the configurable core operates on 32-bit data and includes up to eight clusters.

"We're going after the convergence market. That means offering embedded DSP, exploiting the instruction-level parallelism in DSP algorithms while offering a RISC controller with high code density," Topham said.

The architecture also supports a basic 16-bit instruction format for 32-bit RISC operations.

The basic VLIW cluster is a two-issue pipeline that accepts two 32-bit instruction words to drive two function units. One unit is the si mpler data path, labeled the address generation unit (AGU); the second data path includes a fast multiplier and is known as the execution unit (EXU).

Configurable clusters

Keeping multiple clusters of dual data paths busy is the key to achieving computational efficiency. The appropriate number of clusters will vary with the application. But the number (from one to eight in the 328) is configurable — as are the cache sizes and additional application-specific instructions — via the development tool suite's graphical interface.

Among the application-specific options are instructions to support Gallois field mathematics, which are used in forward-error-correction algorithms and in encryption. The provision of one or two instructions can dramatically accelerate those applications that can make use of them.

"While achieving scalability, it is also important to present a single programming model," said Topham. "We also need to address real-time requirements, so features includ e a predictable response and a fast response to interrupts, and we implement 'precise' exceptions.

"It's the compiler's job to schedule the code around the available clusters. The architecture supports up to 32 clusters, although there's no real hard limit. But 32 clusters would occupy a large die area today."

Typically, the code run by the DSP is intended to run on a primary RISC pipeline, implemented in cluster zero, with sections of code parallelized by the compiler that run as VLIW routines across multiple clusters. The AGU takes care of loads and stores as well as the simple integer and shift operations used to build addresses. The EXU takes care of the main arithmetic operations.

Each cluster has the same basic pipeline, but cluster zero can also support an alternative instruction set: that of the RISC processor that runs the main thread of the program.

Each processor cluster has a set of private registers, split into 32 address and 32 data registers, together with five accumulators. T here are also 32 single-bit predicate registers, which control how the cluster executes each instruction.

Predicated execution and register rotation are used to support efficient unrolling of DSP loops to multiple clusters without requiring the addition of epilogue and prologue code. That complicates assembly code writing, but Topham predicts that compiler-driven deployment from high-level languages is likely to become more common.

In the RISC domain, Topham claims the Siroyan architecture has a compact format, better than those of the ARM Thumb and the MIPS-16. There is a single branch delay with a very low overhead for jumping to VLIW code and returning to RISC code. Siroyan also uses a form of code compression for storing its VLIW code.

As a result, users can jump between control code in a single cluster, maybe switching off other clusters to save power, and DSP code that can use multiple clusters through the inherent parallelism of DSP algorithms. Topham said that powering down would be an im plementation feature left up to licensees of the SRA-328.

Siroyan executives say their VLIW "clustering" approach provides inherently better scalability than communicating processors. Most of the established DSP vendors are looking to put multiple cores on a die to boost performance. But Topham claimed that approach may or may not scale well, depending on the application, and that it presents a more complex programming model.

Additional reporting by Chris Edwards, editor of Electronics Times, EE Times' sister publication in the United Kingdom.

More Microprocessor Forum coverage.

More News

BrainChip Gives the Edge to Search and Rescue Operations (April 21, 2025)
GUC Monthly Sales Report - March 2025 (April 8, 2025)
Tower Semiconductor and Alcyon Photonics Announce Collaboration to Accelerate Integrated Photonics Innovation (March 31, 2025)
RISC-V in Space Workshop 2025 in Gothenburg (March 31, 2025)
Baya Systems Revolutionizes AI Scale-Up and Scale-Out with NeuraScaleâ„¢ Fabric (March 7, 2025)