Optimizing Up/Down Conversion with FPGA Techniques

Optimizing Up/Down Conversion with FPGA Techniques

By Asher Hazanchuk and Sheac Yee Lim, Altera
Dec 23, 2003 (11:00 AM)
URL: http://www.commsdesign.com/showArticle.jhtml?articleID=17100057

Digital upconverters (DUCs) and digital downconverters (DDCs) are important components of every modern wireless base station design. While many DUC and DDC designs are available, there is a clear call in the sector to increase the number of DUCs and DDCs in a system while maintaining overall low system cost. Fortunately, through the use of soft multipliers, designers can now have the ability to house tens of up/downconverters in the same field programmable gate array (FPGA) device.

Soft multipliers make use of an FPGAs memory blocks to boost the number of the dedicated multipliers and enable a highly cost-effective converter system. In the article to follow, we'll show how existing DDC and DUC architectures are crafted. We'll then show how a soft multiplier technique allows base station designers to reduce size and cost by implementing DDCs and/or DUCs in a programmable chip.

DUCs and DDCs: What's Their Job?
DUC are typically used in digital transmitters to filter, upsample, and modulate signals from baseband to the carrier frequency. A DDC, on the other hand, resides in the digital receiver to demodulate, filter, and downsample the signal down to baseband so that further processing on the received signal can be done at lower sampling frequencies.

A DUC consists of a series of cascaded interpolation finite impulse response (FIR) filters, a mixer, and a direct digital synthesizer (DDS) or numerically controlled oscillator (NCO). Figure 1 shows the block diagram of the DUC and the frequency response of the signal after various stages in the DUC.

Click here for Figure 1
Figure 1: Block diagram of a digital upconveter.

In Figure 1, the interpolation FIR filters are used to shape and increase the sample rate of the transmit signal. The output signal from these filters is then mixed with the carrier signal prior to transmission. The carrier signal is usually created using a DDS or NCO that generates the required sine and cosine waves for I and Q data streams. The mixing of the carrier signals with I and Q data streams is done using two multipliers.

A DDC performs the mirror opposite of a DUC on the receiver. The signal that enters the DDC will first be mixed to remove the carrier signal and bring the received signal down to baseband. This is done by multiplying the incoming signal with sine and cosine waveforms created using a DSS or NCO at the same frequency as the carrier frequency. This new signal, centered on the baseband frequency, is passed through several cascaded decimating FIR filters to shape the signal and reduce the sampling rate of the signal.

Typically, the signal converted by the DDC gets transmitted and received at very high sampling rates. However, the receiver generally does not require such high signal resolution to perform the necessary signal processing. Therefore, it is important to decimate (reduce the number of samples) the incoming signal so that the rest of the signal processing can be done at lower, more reasonable sampling rates. Figure 2 shows the block diagram of the DDC and the frequency response of the signal after the various stages in the DDC.

Click here for Figure 2
Figure 2. Block diagram of the DDC and corresponding frequency response.

The interpolation and decimation filtering of a DUC and DDC is typically done in multiple stages using multiple filters. For example, a signal needs to be decimated from a sampling rate of 107.52 MSamples/s to 3.84 MSamples/s. This gives a total decimating factor of 28.

Instead of implementing a large decimating filter that decimates by 28, the decimation process could be broken down into two cascading filters, decimating by factors of 7 and 4 respectively. The first decimate-by-7 filter takes the original sampling frequency of 107.52 MSamples/s and brings it down to 15.36 MSamples/s. The second decimate-by-4 filter takes 15.36 MSamples/s and provides the desired sampling rate of 3.84 MSamples/s. It is also possible to break down the decimation factor into three separate cascading filters.

The biggest benefit of breaking down the filtering function into two or three separate filters is to reduce the resources required to implement the entire filtering function. If a single filter were implemented, it would have to decimate by 28 while maintaining the desired filter passband characteristics. This filter would consume significant resources.

By breaking it down on the other hand, each filter is significantly smaller and easier to design. Also, each filter stage can run at slower sampling rate than the stage before it allowing for the possibility of time-multiplexing the filter resources between I and Q data streams or among multiple data channels.

Alternatively, some designers may choose to implement a cascaded integrator comb (CIC) filter followed by FIR filter stages to perform the rate change. This typically occurs when designing narrowband DUC or DDC that require large rate-change factors (typically interpolate or decimate by a factor larger than 30). For wideband DUC or DDC applications (rate change factor smaller than 30), the factor is small enough where multiple, smaller rate-change FIR filters can be used without consuming too many multipliers.

CIC filters are useful for systems requiring a large rate change factor because the simplicity of the filter structure significantly reduces the complexity of the design (since it does not require any multiplier resources), making it more resource efficient. However, when using CIC filter, the designer has to be aware of the passband droop characteristic in the frequency response of the filter. This passband droop can be rectified by using one or two FIR filters after the CIC filter output to compensate for the droop,⁴

Polyphase FIR Filter
Interpolation or decimating FIR filters are efficiently implemented using polyphase FIR filters. Polyphase filters are commonly used because they help simplify the overall system design and reduce the number of computations per cycle required from the hardware. A polyphase implementation "splits" the original filter into D polyphase filters with impulse responses defined by the following equation:

h_k(n) = h(k + nD)

where:
k = 0, 1, ..., D-1
n = 0, 1, ..., P-1
P = L/D = length of polyphase filters
L = length of the filter (selected as multiple of D for simplicity)
D = Decimation factor This equation states that the first polyphase filter, h₀(n), has coefficients h(0), h(D), h(2D), ..., h((P-1)D). The second polyphase filter, h₁(n), has coefficients h(1), h(1+D), h(1+2D), ..., h(1+(P-1)D), and so on. Consider the polyphase representation of a 16-tap filter with a decimation factor of 4. The output is given by:

From this and Figure 3, it is demonstrated that the output, y(n) is discarded for n ≠ 0, 4, 8, 12; hence, the only values of y(n) that need to be computed are y(0), y(4), y(8), y(12).

Figure 3. Time and frequency representation of decimation for D = 4.

Table 1 shows that the overall decimation filter operation can be represented by four parallel polyphase filters. The output sample is the sum of the results from four polyphase filters: y(n) = y(n)₀ + y(n)₁ + y(n)₂ + y(n)₃.

Table 1: Decimation Filter Split

Figure 4 shows the polyphase representation of the decimation filter. A demultiplexer at the input ensures that the input is applied only to one polyphase filter at a time.

Figure 4: Polyphase filter representation of a D=4 decimation filter.

The polyphase representation of the decimation filter reduces the computational requirement. For the example in Figure 4, polyphase implementation reduces the required number of multiplications and additions by a factor of 4.¹

Symmetry
Another means of reducing the complexity of the FIR filter is to leverage symmetry. Symmetry for an n-tap filter implies, coefficient C₀ = coefficient C_n-1, coefficient C₁ = coefficient C_n-2, etc. In this case, the number of multipliers can be approximately halved. The key is to add the two data values that need to be multiplied with the same coefficient prior to performing the multiplication. Figure 5 shows the structure of a seven-tap symmetrical FIR filter.

Figure 5: Diagram of a seven-tap symmetrical FIR filter.

Challenges and Complexity
Due to the fact that both the DDC and DUC consist of similar blocks, the rest of this paper will focus on the DDC. The DDC block diagram reveals that almost the entire DDC consists of blocks requiring multipliers. The NCO can be implemented with a multiplier-based architecture. The demodulation of I and Q data streams requires multipliers to perform the mixing, and the FIR filters require many multipliers for the various filter stages.

Most of the newer FPGA architectures today have embedded multipliers or DSP blocks along with other resources like logic elements and memories. The availability of high-speed, optimized DSP blocks enables designers to use FPGAs for various DSP-related functions, including implementing the multiplication functions required by a DDC.

From the basic architecture of a DDC, one can see that a DDC requires a significant number of multipliers for just a single channel of data. Coupled with the fact that wireless base stations typically handle multiple channels of data at a time, this could pose a potential resource issue.

The challenge here is trying to implement a DDC system that can support the required number of data channels at the desired data rates without using multiple FPGAs, even though it seems like a single FPGA would not have sufficient DSP blocks. The key lies with distributing the multiplication functions across other available FPGA resources like memory blocks. This technique is also known as soft multipliers.

The Soft Multiplier Approach
The soft multiplier technique moves the distributed arithmetic implementation of sum of multiplications from logic elements into memory blocks to achieve a higher level of silicon size optimization.⁶. Together with other optimization techniques that are part of the soft multiplier architecture, some memory blocks can perform more than five multiplications and additions simultaneously while others can perform more than 14 multiplications and additions simultaneously.

In the case study presented below, all the filtering-related multiplication functions are implemented using soft multipliers. The sum of multiplication architecture is the most optimized soft multiplier architecture for FIR filters due to the multiply-add functionality found in FIR filters.

The sum of multiplication architecture result is the shifted summation of results produced by multiplying a set of input data with a set of coefficients. In this mode, each input sample shifts into the address port of the memory block one bit per clock cycle, starting with the least significant bit (LSB). On the first clock cycle, the LSB of all inputs forms the address value to the memory block(s). The next clock cycle, the second LSB bit for each input forms the next address value to the memory block(s), and so on.

For an n-bit input data width, it takes n clock cycles to shift into the memory block address bus all of the data bits required to compute the final sum of multiplication result. The memory block output produces the multiplication result for a specific bit position at each clock cycle.

In the case of a FIR filter, the shifting of the inputs is handled by the tap-delay line of the filter. For an n-bit input, each tap-delay element of the FIR filter would be n bits long so that each bit can be shifted into the memory block serially. Figure 6 shows the sum of multiplication soft multiplier implementation of a 16-bit input, 16-bit coefficient FIR filter.

Figure 6: Sum of multiplication soft multiplier implementation.

The output accumulator is shift-accumulating the partial products obtained from the memory block once per clock cycle, according to their weights. Each shift-accumulation of a partial product generates an extra carry bit. At the end of the 16th partial product accumulation, the multiplier generates a 35-bit full resolution output. The resolution of the input data influences the output bit width and the latency of the multiplier.

Decimation Filter Architecture
The soft multiplier technique enables the utilization of unused FPGA memory block resources to significantly increase the multiplier resources available in the FPGA. This technique is especially helpful for applications that are multiplier intensive.

The soft multiplier technique can more than triple the number of multipliers available in an FPGA. DDCs and DUCs are extremely multiplier intensive applications. The mixers, NCO, decimation/interpolation FIR filters are implemented with multipliers.

In today's base station designs, the DDC and DUC need to support multi-channel environments and therefore the required per-channel number of multipliers is multiplied by the number of channels. Since hundreds of multipliers are required for DDCs or DUCs in a typical wireless base-station, it makes sense to use soft multipliers to boost the number of available multipliers.

A soft multiplier architecture that is size optimized for a DDC or DUC system should use symmetric and polyphase filter features to achieve higher level of size optimization. Figure 7 describes a soft multiplier decimation filter that use the symmetric and polyphase filter features (described in figures 5 and 4 respectively) to optimize the soft multiplier structure.

Figure 7: Symmetric, polyphase, and soft multiplier-based decimation filter architecture.

A symmetric, polyphase implementation of a FIR filter using the sum of multiplication soft multiplier can be performed by restructuring the order of the coefficients stored at each address location within the memory block and rearranging the input sample sequences, as described in Figure 7.

DDC and DUC designers are attracted to the high level of FPGA system flexibility beyond the capabilities of ASSPs, and FPGA computation power beyond the capabilities of DSP processors. This high computation power is critical to achieving cost effective solutions by condensing high number of DDC and DUC channels into a single FPGA device.

The Key to Success
The key to a cost effective DDC or DUC FPGA solution is distributing the multiplication load among different FPGA resources, in particular the use of FPGA memories as soft multipliers that can more than triple the number of available FPGA multipliers. Multipliers can be implemented in FPGAs with different device resources, such as DSP blocks (if available), memories (implemented as soft multipliers), and logic elements.⁶

An example of a DDC system that distributes the multiplication load between different FPGA resources is described in Figure 8. The mixer multipliers are implemented using DSP block multipliers, the first stage decimation filter is implemented using memory blocks as a symmetric, polyphase 35-tap decimation filter, and the second stage decimation filter is implemented using memory blocks as a symmetric, polyphase 93-tap decimation filter. The NCO is implemented using logic elements. The first stage decimation FIR filter decimates by 7 and the second decimation FIR filter decimates by 4 to give a total decimation factor of 28, reducing the sample rate from 107.52 MHz down to 3.84 MHz.

Click here for Figure 8
Figure 8: Diagram of a soft multiplier-based DDC system.

The benefits of soft multipliers are demonstrated with the DDC example in Figure 8. If only dedicated DSP block multipliers are used than it would be possible to fit 8 DDC channels into one FPGA device. If soft multipliers are used for the first and second stage decimation filters, than it is possible to fit 44 DDC channels into one FPGA.

Wrap Up
The use of symmetric and polyphase filters reduces the amount of multipliers required for DUC and DDC implementations used in wireless base stations. The use of soft multipliers enables efficient use of FPGA resources to increase the amount of channels each FPGA device can handle, thus allowing wireless designers to reduce space and cost in their base station architectures.

References

Altera Corp. Stratix Handbook, Chapter 7, "Implementing High Performance DSP Functions in Stratix, Stratix GX Devices".
Altera Corp. Stratix Handbook, Chapter 9, "Implementing Multipliers in Stratix, Stratix GX & Cyclone Devices".
Altera Corp. "FIR Compiler User Guide".
Ray Andraka, "High performance Digital Down-Converter for FPGA", Xilinx Xcell Journal Issue Number 38, 4Q 2000.
Hunt Engineering. "Digital Down-Converter using FPGA", http://www.hunteng.co.uk/
T. Hollis, "Digital Down-Conversion at the heart of today's communications systems", Global DSP Magazine, Vol. 2 Issue 10, October 2003.
Asher Hazanchuk, "Soft Multipliers for DSP Applications", GSPx Conference, April 2003.

About the Authors
Asher Hazanchuk is a senior manager of DSP applications and architectures at Altera. Asher has a master's degree in Computer Science and a bachelor's degree in Electrical Engineering. He can be reached at ahazanch@altera.com

Sheac Yee Lim is a senior applications engineer in the product applications group at Altera. She has a master's degree in Electrical Engineering and can be reached at sylim@altera.com.

Industry Articles

Optimizing Up/Down Conversion with FPGA Techniques