A practical approach to system implementation using MATLAB and Virtex-4 FPGAs.
by Tom FeistDirector, DSP Tools MarketingXilinx, Inc.Spatially multiplexed multiple-input multiple- output (MIMO) transmitters and receivers promise significant performance gains for wireless communications systems over their existing single-input single-output (SISO) counterparts. Next-generation wireless standards, such as 802.11n, will support data transmission rates as high as 600 Mbps and wireless local area network transmission rates in excess of 1 GHz.
The design of these systems, however, forces a compromise in cost and power that can have significant consequences for handheld devices running on batteries. The challenge facing design teams is to determine the optimal balance between these design requirements for their particular application.
At the heart of this technology is the concept of multipath, which refers to the reflection of radio frequency (RF) signals in a physical environment. Whereas multipath degrades the performance of existing 802.11 devices, spatially multiplexed orthogonal frequency division multiplexing (OFDM) MIMO – a key element of the 802.11n standard – takes advantage of these reflections to “tune” transmissions, minimize errors, and improve overall performance. But at these bandwidths, scattering, diffraction, and absorption by objects in the transmission path are an important consideration. Designing a MIMO system requires that these effects are profiled as accurately as possible in the form of a channel model.
There are three primary sources of channel models: software-based mathematical models, often available from the standards committees; hardware-based MIMO channel emulators, either designed inhouse or provided by companies such as Azimuth; and, best of all, the real-world environment that the MIMO system is intended to operate. Verifying a MIMO system in the real world requires the ability to rapidly prototype the transmitter and receiver on a MIMO-oriented FPGA hardware platform, such as the VHS-ADC-V4 card from Lyrtech.
The MIMO Performance Advantage
The benefit of spatially multiplexed MIMO technology is the ability to increase transmission speed with the number of antennas. The data rate of a today’s existing SISO systems is determined by the formula:
R = Es * Bw
where R is the data rate (bits/second), Es is the spectral efficiency (bits/second/Hertz), and Bw is the communications bandwidth (Hz). For instance, for the 802.11a standard the peak data rate is determined by the formula:
Bw = 20 MHz
Es = 2.7 bps/Hz
R = 54 Mbps
An additional variable “Ns” is introduced into this equation when using MIMO, which is the number of independent data streams that are transmitted simultaneously in the same bandwidth but in different spatial paths. The spectral efficiency is now measured as the transmission per stream Ess, and the data rate of the MIMO system becomes:
R = Ess * Bw * Ns
Let’s compare the previous 802.11a example with what is obtainable with the current 802.11n proposal, operating at a 20 MHz bandwidth and using four antennas:
Bw = 20 MHz
Ess = 3.6 bps/Hz
Ns = 4
R = 288 Mbps
The use of MIMO technology has delivered a 5.3x data rate improvement for the 802.11n proposed standard.
MIMO System Hardware Complexity
The performance gains of a spatially multiplexed MIMO system come at the expense of hardware complexity. A transmit/ receive system that uses multiple antennas not only transmits data between the corresponding antennas but also between adjacent antennas. As you can see in Figure 1, data is received in the form of a “MIMO channel matrix.”
Figure 1 – MIMO channel
Linear algebra techniques such as singular value decomposition (SVD) or matrix inversion are required to decouple the channel matrix in the spatial domain and recover the transmitted data. Backwards compatibility requirements to the 802.11g standard limit the number of antennas for the 802.11n standard to either two or four, which subsequently limits the channel matrix size to either a 2 x 2 or 4 x 4.
Developing a MIMO system prototype in hardware that performs at the actual system data rates requires the use of an FPGA-based hardware platform. The Xilinx® Virtex™-4 family of FPGAs provides far greater performance than a DSP processor for this class of applications by providing as many as 512 hardware multipliers capable of parallel operation. In designing this prototyping system, however, you are faced with two considerable challenges: the first is to design something as complex as an SVD or matrix inverse in hardware and the second is tuning the implementation for optimal performance.
Implementing Matrix Operations on FPGAs
The specific SVD or matrix inversion algorithm selected for implementation will be a tradeoff between numerical stability and hardware efficiency. You will need to develop a high-level MATLAB model to determine the most efficient algorithm for a particular application. In the case of the SVD, this may involve choices between adaptive estimation techniques, vector rotations, or other simplifications that result from channel matrices with special properties such as symmetry.
Once an algorithm has been finalized, you will need to tune the hardware performance to overall system requirements. Maximizing the performance of a MIMO system in hardware will require that partial parallelism of the multiplication operations be implemented in key areas of the design that will have the greatest impact on overall performance. The Givens rotation algorithm shown in Figure 2 provides a nice example of the performance gains possible through parallel multiplication operations. Givens rota- tions are commonly used to solve the symmetric eigenvalue problem and are a key building block of the QRD matrix inverse.
Figure 2 – Givens rotation algorithm
You can implement this algorithm using either multipliers or a CORDIC approximation method. The Xilinx AccelDSP™ Synthesis tool’s design exploration features were used to increase performance by inserting parallelism into the architecture without code rewrites. As shown in Table 1, this allowed performance gains as much as 10x over the parallel CORDIC implementation. Algorithms based on Givens rotations have received greater attention recently because they lend themselves nicely to a parallel implementation.
Table 1 – The range of results obtained by synthesizing a 4 x 4 matrix using the AccelDSP Synthesis tool and targeting a Virtex-4 device.
For large systems, the added hardware that results from increased parallelism must not exceed the resources of the target FPGA. The number of architectural possibilities you must evaluate can be considerable. The process of determining optimal hardware architecture is well suited for a high-level algorithmic synthesis tool such as AccelDSP.
A MATLAB-Based FPGA Design Flow
MATLAB from The MathWorks provides a truly unique environment for the design and implementation of spatially multiplexed MIMO systems. The inherent language support for loops, complex numbers, vector and matrix operations, and mathematical functions provides a highly efficient modeling environment for the linear algebra algorithms required for MIMO.
Figure 3 illustrates the benefits of the AccelDSP Synthesis tool, including the flexibility to define and implement custom architectures for spatially multiplexed MIMO systems on FPGAs using floating-point MATLAB.
Figure 3 – AccelDSP Synthesis design flow
Automated floating- to fixed-point conversion is provided to assist in solving the complex quantization issues resulting from the iterative nature of linear algebra functions such as an SVD. Once you have determined an acceptable fixed-point model, you can rapidly explore performance-versus-hardware tradeoffs using algorithmic synthesis, quickly increasing the number of dedicated hardware multipliers to improve performance and take full advantage of the flexibility of the Virtex-4 architecture. The generated RTL from AccelDSP Synthesis is automatically verified against the goldensource MATLAB to ensure bit-true functional correctness.
Conclusion
Prototyping a spatially multiplexed MIMO system for use in real-world verification is dramatically simplified through the adoption of a MATLAB-based design flow for the channel-matrix DSP hardware development. Development and verification cycle times are reduced by using the MATLAB algorithm as the golden source for FPGA development and eliminating re-writes into other languages or design environments. Additionally, the high-level nature of MATLAB allows the AccelDSP Synthesis tool to quickly explore hardware alternatives for an algorithm, including the use of DSP blocks, RAMs, and pipelining.
The AccelDSP Synthesis tool and Lyrtech prototyping environment both have interfaces to the Xilinx System Generator for DSP design environment to provide an automated MATLAB to prototyping design flow. For more information about the AccelChip solution, visit www.accelchip.com.