If you could squeeze two or three times more cellular telephone conversations into the same amount of bandwidth, how much would that be worth? To most wireless companies, the answer is, millions of dollars. The art and science of "beam forming" allows normal cellular towers to aim their radio waves at the right user instead of off in all directions. The result is more efficient use of bandwidth and more happy customers. In this article, FPGA design experts explain how beam forming works and how to implement it in standard FPGA chips.      There are two constants in the cell-phone business: demand for higher data rates and demand for greater user capacity. Both depend on a unique factor known as spectrum efficiency, the ratio of information bits transmitted per amount of spectrum space used (usually expressed in bits/Hertz). Improving that efficiency generally involves tradeoffs between quality of service, power, and coverage.   
  
Figure 1:  Nonsmart-antennas system   
  Traditional omni-directional antennas, as shown in Figure 1A, act as transducers (that is, they convert electromagnetic energy into electrical energy) and are not an effective way to combat inter-cell and intra-cell interferences. One cost-effective solution to this interference challenge is to split up the wireless cell into multiple sectors using sectorized antennas. As Figure 1B illustrates, sectorized antennas transmit and receive in a limited portion of the cell, typically one-third of the circular area, thereby reducing the overall interference in the system.  
  Efficiency can increase still further by using either spatial diversity or by focusing a narrow beam on a single user. The second approach is known as beam forming, and it requires an array of antennas that together perform "smart" transmission and reception of signals, via the implementation of advanced signal processing algorithms. In this article we'll describe a combination of FPGAs, digital signal processing IP, and embedded processors that implement beam-forming applications. We'll also detail the methods used to implement such applications and the benefits of improved processing speed, system flexibility, and reduced risk that this approach can deliver.   
  Table 1:  Evolution of cellular systems     
  	  		| Cellular system
 | Featured technology
 | 
  	  		| 1G | Analog modulation | 
  	  		|  | Omni directional antennas at base station | 
  	  		|  | Cell clustering (frequency reuse) | 
  	  		| 2G | Reduced frequency reuse distance | 
  	  		|  | Cell splitting and sectorization | 
  	  		|  | Digital modulation, error correction coding | 
  	  		| 3G | Space division multiple access | 
  	  		|  | Switched beam forming | 
  	  		|  | Adaptive nulling and directional beams | 
  
    Smart antennas
  Table 1 provides an overview of the evolution of technology used in different generations of cellular systems. Although beam-forming is being seriously considered only lately for commercial cellular systems, the concept of using multiple antennas and innovative signal processing to serve cells more intelligently has existed for many years. In fact, smart antennas date back to the 1930s, although most significant developments occurred during World War II.1 Varying degrees of relatively costly smart-antenna systems have already been applied in defense systems for years. Cost has prevented their use in commercial systems until fairly recently, however. The advent of powerful, low-cost field programmable gate arrays (FPGAs), digital signal processors (DSPs), and innovative signal-processing algorithms have made intelligent antennas practical for cellular systems. Smart antennas can enhance the efficiency of existing systems or be made an integral part of more advanced 3G and 4G  mobile systems.   
  Compared with traditional omni-directional and sectorized antennas, smart-antenna systems can provide:  
  - Greater coverage area for each cell site  
- Better rejection of co-channel interference  
- Reduced multipath interference via increased directionality  
- Reduced delay spread as fewer scatterers are allowed into the beam  
- Increased frequency reuse with fewer base stations  
- Higher range in rural areas  
- Improved building penetration  
- Location information for emergency situations  
- Increased data rates and overall system capacity  
- Reduction in dropped calls  
Table 2 summarizes the pros and cons of different smart-antenna technologies including beam-forming and multiple-input and multiple-output (MIMO) technologies. This article focuses on beam-forming smart-antenna technologies as they provide significant benefits at medium implementation cost and complexity. Description of MIMO technology is beyond the scope of this article.  Table 2:  Comparison of smart-antenna technologies  
  
  	  		| Multiple antenna scheme | Diversity | Switched beam forming | Adaptive beam forming | MIMO | 
  	  		| Pros | Simple to implement   Low cost. | Simple to implement   Low cost. | High capacity with reduced interference   Best suited for line of sight environment | High data rates   Best suited for rich scattering environment. | 
  	  		| Cons | Limited benefits | Limited configuration flexibility   There must be at least one strong direct component for DOA estimation | Medium complexity   High cost     There must be at least one strong direct component for DOA estimation | High complexity   High cost     Evolving technology | 
  
    How is it done?
  A linearly arranged and equally spaced array of antennas forms the basic structure of a beam former. In order to form a beam, each user's information signal is multiplied by a set of complex weights (where the number of weights equals the number of antennas) and then transmitted from the array. The important point in this transmission is that the signals emitted from different antennas in the array differ in phase (which is determined by the distance between antenna elements) as well as amplitude (determined by the weight associated with that antenna).    
  Changing the direction of the beam, therefore, involves changing the weight set as the spacing between the antenna elements is fixed. The rest of this article describes two such schemes known as switched and adaptive beam forming. Direction of arrival (DOA) estimation with algorithms such as MUSIC, ESPRIT, and CAPON is beyond the scope of this article.  
  Switched and adaptive beam 
  If the complex weights used are selected from a library of weights that form beams in specific, predetermined directions, the process is called switched beam forming. In this process, a hand-off between beams is required as users move tangentially to the antenna array. If the weights are computed and adaptively updated in real time, the process is known as adaptive beam forming. The adaptive process permits narrower beams and reduced output in other directions, significantly improving the signal-to-interference-plus-noise ratio (SINR). With this technology, each user's signal is transmitted and received by the base station only in the direction of that particular user. This drastically reduces the overall interference in the system. A smart-antenna system, as shown in Figure 2, includes an array of antennas that together direct different transmission/reception beams toward each cellular user in the system.    
  
Figure 2:  A beam-forming smart-antennas system  
  The rest of the article describes the real-time implementation of adaptive digital beam forming with FPGAs. We won't discuss switched beam forming further as it's relatively easy to implement and has only limited benefits when compared with the adaptive version.  
  Implementing adaptive beam 
  Adaptive beam forming can be combined with the well known Rake receiver architectures that are widely used in CDMA-based 3G systems, to provide processing gains in both the temporal and spatial domains. This section describes the implementation of a Rake beam-former structure, also known as a two-dimensional Rake, which performs joint space-time processing. As illustrated in Figure 3, the signal from each receiving antenna is first down-converted to baseband, processed by the matched filter-multipath estimator, and accordingly assigned to different Rake fingers.    
  
Figure 3:  Basic block diagram of adaptive beam forming with FPGA  
  The beam-forming unit on each Rake finger then calculates the corresponding beam-former weights and channel estimate using the pilot symbols that have been transmitted through the dedicated physical control channel (DPCCH). The QR-decomposition-(QRD)-based recursive least squares (RLS) algorithm is usually used as the weight-update algorithm for its fast convergence and good numerical properties. The updated beam-former weights are then used for multiplication with the data that has been transmitted through the dedicated physical data channel (DPDCH). Maximal ratio combining (MRC) of the signals from all fingers is then performed to yield the final soft estimate of the DPDCH data.   
  Applying complex weights to the signals from different antennas involves complex multiplications that map well onto the embedded DSP blocks available for many FPGAs. The example in Figure 4 shows DSP blocks with a number of multipliers, followed by adder/subtractor/accumulators, with registers for pipelining. Such a structure lends itself to complex multiplication and routing required in beam-forming designs.   
  
Figure 4:  Example DSP block architecture  
  Adaptive algorithms
  Adaptive signal processing algorithms such as least mean squares (LMS), normalized LMS (NLMS), and recursive least squares (RLS) have historically been used in a number of wireless applications such as equalization, beam forming and adaptive filtering. These all involve solving for an over-specified set of equations, as shown below, where m > N:   
   
  
  Among the different algorithms, the recursive least squares algorithm is generally preferred for its fast convergence. The least squares approach attempts to find the set of coefficients that minimizes the sum of squares of the errors, in other words:  
   
  
  Representing the above set of equations in the matrix form, we have:  
  Xc = y + e (1)  
  where X is a matrix (mxN, with m>N) of noisy observations, y is a known training sequence, and c is the coefficient vector to be computed such that the error vector e is minimized.  
  Direct computation of the coefficient vector c involves matrix inversion, which is generally undesirable for hardware implementation due to numerical instability issues. Matrix decomposition based on least squares schemes, such as Cholesky, LU, SVD, and QR-decompositions, avoid explicit matrix inversions and are hence more robust and well suited for hardware implementation. Such schemes are being increasingly considered for high-sample-rate applications such as digital predistortion, beam forming, and MIMO signal processing. FPGAs are the preferred hardware for such applications because of their ability to deliver enormous signal-processing bandwidth.   
  FPGAs provide the right implementation platform for such computationally demanding applications with their inherent parallel-processing benefits (as opposed to serial processing in DSPs) along with the presence of embedded multipliers that provide throughputs that are an order of magnitude greater than the current generation of DSPs. The presence of embedded soft processor cores within FPGAs gives designers the flexibility and portability of high-level software design while maintaining the performance benefits of parallel hardware operations in FPGAs.   
  QRD-RLS algorithm
  As described in Pattan's book,1 the least squares algorithm attempts to solve for the coefficient vector c from X and y. To realize this, the QR-decomposition algorithm is first used to transform the matrix X into an upper triangular matrix R (N x N matrix) and the vector y into another vector u such that Rc=u. The coefficients vector c is then computed using a procedure called back substitution, which involves solving these equations:   
   (2)
(2)  
   (3)
(3)  
  The QRD-RLS algorithm flow is depicted in Figure 5.   
  
Figure 5:   QR-decomposition-based least squares  
  
Figure 6:   Triangular systolic array example for CORDIC-based QRD-RLS  
  The QR-decomposition of the input matrix X can be performed, as illustrated in Figure 6, using the well-known systolic array architecture. The rows of matrix X are fed as inputs to the array from the top along with the corresponding element of the vector y. The R and u values held in each of the cells once all the inputs have been passed through the matrix are the outputs from QR-decomposition. These values are subsequently used to derive the coefficients using back substitution technique.   
  Each of the cells in the array can be implemented as a coordinate rotation digital computer (CORDIC) block. CORDIC describes a method of performing a number of functions, including trigonometric, hyperbolic, and logarithmic functions.2 The algorithm is iterative and uses only add, subtract, and shift operations, making it attractive for hardware implementations. The number of iterations depends on the input and output precision, with more iterations being needed for more bits.  
  For complex inputs, only one CORDIC block is required per cell. Many applications involve complex inputs and outputs to the algorithm, for which three CORDIC blocks are required per cell. In such cases, a single CORDIC block can be efficiently timeshared to perform the complex operations.  
  Direct mapping of the CORDIC blocks onto the systolic array, as shown in Figure 6, consumes a substantial amount of an FPGA's logic but yields enormous throughput that's probably overkill for many applications. The resources required to implement the array can be reduced by trading throughput for resource consumption via mixed and discrete mapping schemes.  
  In a mixed mapping scheme, the bottom rows in the systolic array are moved to the end of the top rows to make it possible to have the same number of cells in each row. Then, a single CORDIC block can perform the operations of all the cells in a row, with the total number of CORDIC blocks required being equal to the total number of rows. This is called mixed mapping because each CORDIC block has to operate in both vectorize and rotating modes.3  
  In a discrete mapping scheme, at least two CORDIC blocks are required. One block is used purely for vectorize operations while the other is used for rotate operations.4 This single characteristic of the processor enables the realization of many gains from hardware optimization, such as enabling tradeoffs between speed and resource consumption on the FPGA. More information on the different mapping schemes can be found in the Rader, Lightbody et al., and Walke et al.3,4,5  
  Weights and measures
  The beam-former weights vector c is related to the R and u outputs of the triangular array as Rc=u. R being an upper triangular matrix, c can be solved using a procedure called back substitution. As outlined in Haykin and Zhong Mingqian et al., the back-substitution procedure operates on the outputs of the QR-decomposition and involves mostly multiply and divide operations that can be efficiently executed in FPGAs with embedded soft processors.6,7   
  Some FPGA-resident processors can be configured with a 16x16 -> 32-bit integer hardware multipliers. The software can then complete the multiply operation in a single clock cycle. Since hardware dividers generally are not available, the divide operation can be implemented as custom logic block that may or may not become part of the FPGA-resident microprocessor. Between the multiply and divide accelerators, back-substitution becomes easy and efficient.  
  FPGA advantages
  Smart-antenna technology requires a lot of processing bandwidth, in the neighborhood of several billion multiply-and-accumulate (MAC) operations per second. Such computationally demanding applications can quickly exhaust the processing capabilities of many DSPs. Some FPGA chips with embedded DSP blocks, on the other hand, provide throughput in excess of 50 GMAC/sec, offering a high-performance alternative for beam-forming applications.   
  There are a number of beam-forming architectures and adaptive algorithms that provide good performance under different scenarios, such as transmit/receive adaptive beam forming and transmit/receive switched beam forming. FPGAs with embedded processors are flexible by nature, providing options for various adaptive signal-processing algorithms.  
  The standards for next-generation networks are continually evolving and this creates an element of risk for beam-forming ASIC implementations. Transmit beam forming, for example, utilizes the feedback from the mobile terminals. The number of bits provided for feedback in the mobile standards can determine the beam-forming algorithm that is used at the base station. Moreover, future base stations are likely to support transmit diversity, including space/time coding and multiple-input, multiple-output (MIMO) technology. Since FPGAs are remotely upgradeable, they reduce the risk of depending on evolving industry standards while providing an option for gradual deployment of additional transmit diversity schemes.  
  Deepak Boppana is advanced technical marketing engineer for wireless applications at Altera. He has over four years of experience in embedded system design and wireless communications. Deepak has an MS in electrical engineering from Villanova University and can be contacted at dboppana@altera.com.  
  Asif Batada is strategic marketing manager for Altera. He has worked as a base station hardware designer for Nortel Networks and more recently as a field applications engineer for Cadence Design Systems supporting their RF and analog tools. Asif holds a BS in electrical engineering from Queen's University at Kingston, Canada and MBA from Santa Clara University. He can be reached at abatada@altera.com.  
  End notes
  
   - Pattan, Bruno. Robust Modulation Methods and Smart Antennas in Wireless Communications, Prentice Hall, First Edition, 2000.  
- Volder, J. "The CORDIC trigonometric computing technique," IRE Trans. Electron. Comput., Vol. EC-8, pp. 330-334, 1959.  
- Rader, C.M. "VLSI systolic arrays for adaptive nulling," IEEE Sig.Proc.Mag, Vol.13, No.4, pp.29-49, 1996.  
- Lightbody, G., R.L.Walke, and R.Woods, J.McCanny, "Novel mapping of a linear QR architecture," Proc. ICASSP, vol IV, pp.1933-6, 1999.  
- Walke, R.L. and R.W.M. Smith, "Architectures for adaptive weight calculation on ASIC and FPGA," 33rd Asilomar Conference on Signals, Systems and Computers, 1999.  
- Haykin, Simon. Adaptive Filter Theory, Prentice Hall, Fourth Edition.  
- Zhong Mingqian, Tim, A.S.Madhukumar, and Francois Chin, "QRD-RLS adaptive equalizer and its CORDIC-based implementation for CDMA systems," International Journal on Wireless & Optical Communications, Vol.1, No.1 (2003) 25-39.  
      Copyright 2005 © CMP Media LLC