OTP/eMTP: 128bit  256Kbit NVM in 180nm Standard Logic CMOS process
Simplifying DSP Hardware Development within a MATLABbased Design Flow
Historically, exploiting FPGA or ASIC implementation of DSP algorithms has been the domain of companies with highlyskilled designers and large budgets. Now, a new generation of tools is bringing hardwarebased DSP implementation within the reach of a much wider community through a new paradigm in DSP design: algorithmic synthesis. Eric Cigan, AccelChip, Inc. and Aaik van der Poel, Synopsys, cover the design flow issues in detail in this twopart article.
Part 1: From Algorithm to Architecture Specification
The increasing performance requirements of the latest defense, communications and consumer products demand a shift from software to hardware implementation, especially for the computeintensive Digital Signal Processing (DSP) algorithms. To take advantage of the improved performance that hardware implementation offers requires more automation and collaboration between the system level specification and implementation.
The ability of algorithm developers, system engineers and hardware designers to collaborate effectively on a project partly depends on having available a true topdown DSP design methodology. Algorithmic synthesis tools can support this approach, but must provide an effective link with the tool environments that are commonly used to specify and explore at the algorithmic level. A primary requirement is to be able to automatically evaluate potential implementation options during early design stages, and rapidly make design tradeoffs. This ability will provide the foundation for a lowrisk methodology that produces the most costeffective designs, while still meeting design specifications.
DSP Implementation Alternatives
Designers have available to them a variety of implementation options for DSP algorithms, each of which is suited to different classes of problems.
 ApplicationSpecific Integrated Circuit (ASIC) – Customized chips of digital logic designed from standard cells.
 Applicationspecific standard product (ASSP) – Dedicated chip sets, e.g., MPEG4 decoders.
 Field Programmable Gate Array (FPGA) – Arrays of gates that can be reprogrammed though downloading a bitstream – primarily from Xilinx and Altera.
 Generalpurpose processor (GPP) – software programmable devices, such a those from Intel, Freescale, ARM, MIPS.
 DSP – software programmable device optimized for numericallyintensive tasks – primarily from TI, Analog Devices and Freescale.
The implementation technology that is right for a particular project depends on many factors. In general, the hardware implementations (ASIC, ASSP and FPGA) enjoy performance advantages over software implementations (GPP and DSP), while software implementations have provided much easier and less expensive development environments. Performancecritical functions can also be migrated from software to silicon by using a hybrid approach: GPPs and DSPs can be augmented with “hardware accelerators” implemented in FPGAs/ASICs. Figure 1 shows how TI achieved a 16times performance boost through use of a Viterbi coprocessor for its C6000 family of DSPs [1].
Figure 1: Example of hardware accelerator: Viterbi coprocessor in TI TMS320C6000 platform (source: TI)
Conventional DSP hardware design flows
Figure 2 shows a general form of a typical DSP hardware design flow. DSP design has traditionally been divided into two types of activities – systems/algorithm development and hardware/software implementation. These tasks have been accomplished by two very disparate groups of engineers that often have little connection or interaction.
The flow originates with algorithm developers and system engineers. Algorithm developers create, analyze and refine the required DSP algorithms using mathematical analysis tools at the behavioral level, often without consideration for the underlying system architecture or hardware/software implementation details. The system designer is concerned with defining the functionality and architecture of the design to adhere to the product specification and interface standards. Systems designers and algorithm developers interact efficiently with each other because they work in a common design environment based on a highlevel programming language. According to market research firm Forward Concepts as well as reports in FPGA and Programmable Logic Journal, the majority of DSP system designers and algorithm developers use the MATLAB® language from The MathWorks®. [2]
Figure 2: Conventional DSP hardware design flow
In contrast, hardware designers take the specifications created by the systems engineers and algorithm developers and are tasked to create a physical implementation of the DSP design. If the target of the DSP algorithm is an FPGA, structured ASIC, ASIC or SOC, the first task is to create a register transfer level (RTL) model in a hardware description language (HDL) such as Verilog or VHDL. The hardware designer must have a sufficient understanding of communications theory and signal processing to be able to interpret the written specification provided by the systems engineer. The process of creating an RTL model and a simulation testbench usually takes many months because of the need to verify that the manually created RTL file exactly matches the MATLAB model.
Once the RTL model and simulation environment is created, the hardware designer interacts with the systems engineers and algorithm developers to analyze the performance, area and functionality of the hardware realization of the DSP system. It is quite common for the original algorithms and system architecture to be modified because the systems engineers had no visibility into the physical design domain during the algorithm development. The iteration process continues – refine the algorithms and system architecture, update the written specification, modify the RTL models and testbenches, and resimulate – until the DSP system requirements are met by the hardware realization. The design flow then continues with a standard FPGA and/or ASIC topdown design flow using logic synthesis, and ultimately physical design tools to place and route the netlist in a given FPGA or ASIC device.
Efficient DSP Design Creation with MATLAB
In the DSP domain, MATLAB® is the design language of choice, providing an efficient systemlevel verification environment, a variety of design tools, and advanced graphical tools for 2D, 3D and animated visualization. Builtin abstractions liberate the designer from the strict modeling style guides that are required by generalpurpose languages, allowing large design objects to be represented with a high degree of efficiency.
To demonstrate MATLAB’s efficiency in modeling and analyzing a DSP algorithm, it is useful to consider a detailed example.
MATLAB Model for 3Dimensional Vector Rotation Consider an algorithm that implements a rotation of a threedimensional vector by rotations á, â, and ã around the x, y and z axes, respectively, as shown in Figure 3. [3]
Figure 3: 3Dimensional Vector Rotation
This vector rotation could be computed as a series of successive coordinate transformations in the form of matrix multiplications. The matrix equation takes the following form:
R_{rot} = T_{ã} · T_{â} · T_{è} · R
where R is the original vector in a threedimensional coordinate system, T_{è}, T_{â} and T_{ã} are 3by3 matrices and R_{rot} is position of the original vector after applying three rotations.
Algorithm developers like to use the MATLAB language to develop this sort of algorithm for several reasons. First, variables in MATLAB can be of many types including scalars, vectors and matrices, so the algorithm developer doesn’t have to declare variables or create iterative loops to implement operations on these types. Second, the MATLAB language has a wealth of builtin mathematical functions that let the designer focus on the toplevel algorithm rather than reimplementing – in this case trigonometric operations such as sine or cosine. Third, MATLAB allows hierarchical design so that complex functions can be structured in a form that is natural to the developer and easy for other members of a design team to understand. Finally, the MATLAB environment has a wide array of visualization tools that span 2dimensional and 3dimensional plotting, as well as animated graphics.
Figure 4: MATLAB algorithm implementing 3D vector rotation
For the coordinate rotation example, MATLAB allows this algorithm to be written concisely as shown in Figure 4. In this MATLAB Mfile, ang is a 3vector of the rotation angles {á, â, ã}, R is a 3vector representing the Cartesian coordinates of the vector R, and R_rot is a 3vector representing the Cartesian coordinates of the rotated vector R_{rot}. What’s significant is that a single line of MATLAB code, such as line 7, expresses the product of three matrices and a vector. In other highlevel languages such as C, this would need to be broken up into the different matrix products with the use of nested loops to index through the matrices and compute the inner products. This compact format makes the design easier for the algorithm developer because it preserves the structure and intent of the algorithm.
This algorithm can be readily embedded within a script file that creates stimulus and displays results; Figure 5(a) shows is an example of such a script file with the corresponding results shown in Figure 5(b).
Figure 5: 3D Vector Rotation design. (a) Script Mfile for design in MATLAB. (b) Simulation results for results using this script file.
Converting FloatingPoint to FixedPoint
The mathematicallyintensive nature of DSP algorithms raises the issues of precision and computation accuracy. DSP algorithm developers often begin work in floating point arithmetic because it provides the most generality in evaluating the benefits of candidate algorithms. However, when it comes to implementing an algorithm in hardware, there are benefits and tradeoffs to using fixedpoint hardware rather than floatingpoint hardware. DSP applications such as handheld devices or aerospace systems require lowpower and costeffective circuitry; fixedpoint hardware tends to be simpler and smaller, these units require less power and cost less to produce than floatingpoint circuitry.
The difficulty with fixedpoint implementations is that they are more complex to design. Floatingpoint numbers use a scaled number multiplied by an exponent to represent a broad dynamic range while maintaining significant precision. Fixedpoint numbers lack the exponent, so the number of bits must be carefully chosen to constrain the impact of overflows and underflows while minimizing the bitwidth of numbers.
Algorithmic Synthesis – A Brief Review
Algorithms start life as descriptions that are very functional in nature. These functional descriptions often lack specific digital implementation details like bit precision, timing and microarchitecture. In fact, hardware designers refer to them as descriptions at the unclocked, or algorithmic level. The predominant hardware implementation technique, RTL design, does require the introduction of timing and forces us to make a choice in microarchitecture. This choice is often based on already available and completed design parts (strategy of risk avoidance) or pure empirical knowledge (strategy of experience) and might not be the most optimum choice.
As illustrated in Figure 6, algorithmic synthesis bridges this void by automating the process of creating the microarchitecture based on constraints like overall latency, throughput, cost (silicon area) and performance (clock speed). It also opens the door to exploring multiple architectures quickly before committing to a particular implementation.
Figure 6: Algorithm Synthesis allows for quick exploration of multiple implementation options from a single source
Summary and Next Steps
In the first part of this article, we have seen that what used to be a field of expertise founded on "exclusive knowledge” is slowly but surely being drawn into mainstream design because of the demanding signal processing functions in today's imaging, consumer and mil/aero applications.
The timeconsuming nature of the traditional design flow, with a manual gap between algorithm development and implementation, is no longer acceptable. In this first part we have touched on why the algorithm developers prefer to use the MATLAB language and analysis environment, and why it is worlds apart from what hardware designers like to see as a starting point.
In the next section we will take a peek under the hood of what algorithm synthesis actually does when it automatically converts the fixed point MATLAB functional descriptions to Register Transfer Level hardware descriptions, thus closing the "exclusive knowledge" bridge with an automated, repeatable and risk reducing flow step. We will also highlight how we further can decrease our risk of misinterpretations when we move from prototyping in FPGAs to production in ASICs.
References
[1] Leon Adams, Texas Instruments, “Semiconductor options for realtime signal processing,” EDN, November 25, 2004, p. 8794.
[2] Kevin Morris, “Destination DSP: Methodologies for Signal Processing Success,” FPGA and Programmable Logic Journal, Volume 5, Number 9, November 30, 2004.
[3] Eric W. Weisstein. “Rotation Matrix.” From MathWorld – A Wolfram Web Resource. http://mathworld.wolfram.com/RotationMatrix.html
Eric Cigan
Product Marketing Manager, AccelChip Inc.
As product marketing manager for AccelChip Inc., Eric Cigan is responsible for product planning and promotion for the AccelChip product family. He has more than fifteen years' experience in the EDA industry.
He was most recently at Mentor Graphics, Inc., where he managed product marketing and business development in Mentor's hardware/software coverification business. Prior to this, Cigan held positions as aerospace/automotive segment manager for Analogy (now part of Synopsys) and product marketing manager, account manager and research engineer at Integrated Systems, Inc. (now part of Wind River).
Cigan began his career in control system design at the Lockheed Missiles & Space Company and the Charles Stark Draper Laboratory. Cigan holds S.B. and S.M. degrees in Mechanical Engineering from the Massachusetts Institute of Technology.
Ir. Aaik van der Poel
Group Marketing Manager, Synthesis, Synopsys
Aaik van der Poel currently oversees various aspects of the Synopsys (structured) ASIC, Behavioral, Cbased, and FPGA synthesis product lines, and has over 20 years EDA marketing, sales and support experience.
He joined Synopsys in 2000 and was responsible for the CoCentric SystemC synthesis product line introduction. Prior to joining Synopsys he held senior marketing and application positions at Mentor Graphics and Tektronix Europe and was a chip designer at ICN design house in the Netherlands. Van der Poel holds a M.S. in Electrical Engineering from the University of Twente in the Netherlands and a patent on isochronous (large) system design.
WEB LINKS
TRADEMARK NOTICE
AccelChip and AccelWare are registered trademarks of AccelChip Inc. Design Compiler, DesignWare, Formality, Synopsys , VCS and the Synopsys logo are registered trademarks of Synopsys, Inc., and Galaxy is a trademark of Synopsys, Inc. MATLAB is a trademark of The MathWorks, Inc. All other trademarks or registered trademarks mentioned in this document are the intellectual property of their respective owners.
©2005 Synopsys, Inc. Synopsys and the Synopsys logo are registered trademarks of Synopsys, Inc. All other company and product names mentioned herein may be trademarks or registered trademarks of their respective owners and should be treated as such.

Related Articles
 Hybrid execution  the next step in the evolution of hardwaresoftware codevelopment
 Dealing with automotive software complexity with virtual prototyping  Part 2: An AUTOSAR use case
 Agile hardware development  nonsense or necessity?
 Hardwarebased floatingpoint design flow
 Validate hardware/software for nextgen mobile/consumer apps using softwareonchip system development tools
New Articles
 Slash SoC power consumption in the interconnect
 Effective Optimization of Power Management Architectures through Four standard "Interfaces for the Distribution of Power"
 Safeguard your FPGA system with a secure authenticator
 Creating core independent stimulus in a multicore SoC verification environment
 RealTime Trace: A Better Way to Debug Embedded Applications
Most Popular
Email This Article  PrinterFriendly Page 