

WCDMA RAKE Receiver Comes to Life in DSP
WCDMA RAKE Receiver Comes to Life in DSP The computational requirements for UMTS wideband CDMA (WCDMA) are substantially higher than that of the secondgeneration GSM and CDMA systems. Due to the complexity, many designers are turning to ASICs as a means for handling computationally intensive processing tasks. However, some tasks, such as chip and symbol rate processing, may best be handled in a digital signal processor. In this article, we'll show how a single digital signal processor (DSP), housing four arithmetic logic units (ALUs), can be used to handle chiprate and symbolrate processing in downlink of a WCDMA handset design. During the discussion, we'll present some baseband processing algorithms related to the RAKE receiver and outline methods to speed up these algorithms for practical implementation. This article will also present methods to speedup dispreading, descrambling, channel estimation and other downlink RAKE receiver functions. Understanding the Receiver For the firstgeneration of UMTS handsets, RAKE receiver will be used as the receiver of choice. In a RAKE receiver, one RAKE finger is assigned to each multipath, thus maximizing the amount of received signal energy. Each of these different paths are combined to form a composite signal that is expected to have substantially better characteristics for the purpose of demodulation than just the a single path. In order to combine the different paths meaningfully, the RAKE receiver needs the knowledge of channel parameters such as, number of paths, their location (in the delay domain) and (complexvalued) attenuation. Figure 1 shows a typical fourf inger RAKE receiver where r(t) is the received signal. Since r(t) consists of multipath components, it can split into r(tτ_{i}) independent paths which can be combined with the corresponding channel estimates g(t, τ_{i}).
In a WCDMA receiver the following steps take place (excluding the error correction coding):
The objective of the channel estimation block is to estimate the channel phase and amplitude [denoted in Figure 1 as g(t, τ_{i})] for each of the identified paths. Once this information is known, it can be used for combining each path of the received signal. The above operations can be mathematically expressed by the following equations starting form the transmitter end. In Equation 1, u(t) is the transmitted signal, a_{k} is the complex t ransmitted symbol, p_{k} is the complex spreading (OVSF) and scrambling code combined, N is the spreading factor, f(t) is the pulse shaping filter (root raised cosine with rolloff factor 0.22) and T is the chip duration.
Equation 2 shows the received signal y(t) at the mobile unit. The mobile channel is modeled as a filter with complex taps given by c_{i} and delays of d_{i}, where {j=0..J1} for J different paths and g(t) is the additive white Gaussian noise (AWGN) with singlesided power spectral density (p.s.d) N_{0}.
At the receiver end, the received data is first passed through a matched filter (matched to the transmitter filter). The match filter maximizes the signaltonoise ratio (SNR) at the receiver. This process is shown in Equatio n 3:
A path searcher estimates the delay of each path (τ_{i}) in the composite received signal r(t). Then the received signal is delayed by the amount estimated by the path searcher and multiplied by the conjugate of scrambling and spreading code (code that was used for transmission). The descrambled and despread data are then summed over one symbol period as shown in Equation 4. As an example, if there are four strong paths, four different estimate of the same symbol will be generated using:
The estimates generated through Equation 4 will be combined by the Rake receiver with the corresponding channel estimate as shown in Equation 5:
In Equation 5, c_{j} are the channel estimates and d_{j} are the estimated path delays, J is the estimated number of strong paths, det( ) is a simple decision device and a is the estimated bit/symbol obtained at the output of the RAKE receiver. Channel Estimation In a WCDMA downlink traffic channel (DPDCH/DPCCH), pilot symbols (2 to 8 symbols) and control symbols are transmitted in every slot. There are 15 slots per WCDMA frame. Each frame is 10 ms long and has 38400 chips (3.84 MChips/s). Channel estimation can be made using these pilot symbols. If these time multiplexed pilot data bits are used, then the estimate for the data bits in between two consecutive sets of pilot bits (two slots) can be obtained by interpolation. The DD channel estimation approach can be then used to improve the performance. Figure 2 gives a layout of the pro cess of channel estimation using time multiplexed symbols.
Also, in the downlink of the WCDMA system, a common control channel (CPICH) is transmitted with a higher power than the dedicated traffic channels. This channel is received by all the mobiles in a given cell. CPICH is transmitted with a constant spreading factor (SF) of 256 and a spreading code of all ones. This means there are 10 symbols per slot and 150 symbols per frame of CPICH. All the symbols of the CPICH are 1+j. At the receiver end, the CPICH symbols as pilot symbols and can be used for channel estimation. The advantage of using CPICH for channel estimation is that all the data in the frame can be used for channel estimation as opposed to only a few symbols in the DPCCH/DPDCH. Also since this is transmitted with a higher power then the traffic c hannel it will have better reception at the handset. While both channel estimation techniques are effective, we've used CPICH for channel estimation to show how the fourALU DSP handles Rake receiver tasks. Figure 3 defines the channel estimation process that was implemented on the DSP.
For each independent path the channel estimate is obtained as follows:
DSP Implementation 1. Handling Channel Estimation
Data is read out of the memory holding the received CPICH symbols and after performing the additions shown in Equation 6 the data is written to the memory. The next step in the process involves smoothing out the output. This is done with a moving average (MA) filter. The MA filter is defined by the following equation:
In Equation 7, ~c_{j}(ni) are the noisy channel estimate from the CPICH and a(i) are the filter coefficient and ^c_{j} are the final channel estimates that will be used by the MRC. All the filter coefficients are equal (1/N) . From Equation 7, it can be seen that the number of memory reads/writes and multiply accumulate (MAC) operations can be reduced by simply adding a new sample (scaled by 1/N) to the running sum and removing the oldest sample form the running sum. This is the process used in computing the MA filter output in the DSP. However, the very first sample of the running sum is computed by reading four complex numbers form the buffer at a time and performing four MAC operations per cycle and repeating the loop N/4 times. Once computed, the output of the MA filter is written to memory. Before writing the results, "rate matching" needs to be performed between the CPICH and the traffic channel (DPCCH/ DPDCH). CPICH is always transmitted with a spreading factor of 256 and the traffic channel can be transmitted with spreading factor in the range of 4 to 512 depending on the data rate needed. This rate matching is performed by a simple zeroorder hold (or decimation for SF=512). A true interpolation filter is not needed as the channel changes much slower then the symbol interval. Results for the channel estimation algorithm are presented below 2. Path Searcher The received signal at the mobile unit is correlated with the stored cell specific scrambling code. Equation 8 expresses the output of the correlation process.
In Equation 8, N is the size of the autocorrelation window, which can be thought of as number of taps for the auto correlation. This number is chosen by some finger management routine. In this example, auto correlation is performed to handle a delay spread of up to 20 μs, which is about 320 slides of the auto correlation window. This index is denoted by the m in Equation 8 while P is the length of autocorrelation output (approximately 320 for this implementation). Since the path searcher output is generated on a framebyframe basis, n in Equation 8 refers to the frame index. At the instant in time when the stored and the received sequences are perfectly aligned the autocorrelation output y(n,m) at that index becomes:
The first term on the righthand side is the average of the channel coefficients. This is a complex Gaussian distribution. The summation and scaling in Equation 9 for the first term is essentially the same as taking the mean of the channel coefficient. Thus, it tur ns out to be the local mean of the channel coefficient. The second term on the R.H.S. is interpath interference (IPI), which is interference caused by other paths that do not align perfectly with the scrambling code. Since the scrambling code yields a peak only if two scrambling codes are perfectly aligned, the IPI is usually small when compared with the third term, which is multiaccess interference (MAI) and thermal noise. Typically, the first term is larger than the last two terms. However, in deep fading when the local mean is close to zero, the magnitude of the first term can be significantly lower. After running autocorrelation, noncoherent averaging is used to combat a spurious peak signal in a fading channel. This technique averages the current and previous M1 power delay profiles yielding a better estimate as the noise is averaged out. This can be expressed as follows:
Assuming that the channel statistic does not change much over M frames, we can take the expected value of the power delay profile, as follows:
The first term contributes to the peak in the power delay profile, and the second term constitutes the noise floor. If the received and the stored sequence do not line up perfectly, the first term disappears, leaving only the second term. After performing noncoherent averaging, a local peak search technique is employed to find all local peaks in the power delay profile. This search technique is based on the observation of three points. As long as the middle point is higher than the two points at the side, a local peak is found. A threshold is then computed that should be higher than all the floor noise but lower than the true "delay index" peak. As seen previously the floor noise comes from interpath interference, multichannel interference, and thermal noise. An adaptive threshold is used because it is more robust, accounting for both interference and noise variations. The formula in calculating the threshold is:^{4}
After finding the compute threshold, designers must perform a local peak removal operation. This is the final operation of the path searcher. All the local peaks are compared against the threshold, and peaks lower than the threshold are removed and the higher are retained. Here are some cycle counts and code size for the above path searcher (Table 1).
3. Maximum Ratio Combining (MRC) Figure 1 above shows the block diagram for combining at the symbol level. Chiplevel combining performs combining followed by descrambling and despreading. The performance of both combining schemes are the same under perfect channel estimation, path search, and assuming that the fading channel is constant over a symbol period. Table 2 shows the estimation of the computational loads of both combining schemes for one channel. The descrambling and despreading are combined and are done in one step. It is assumed that the scrambling code and the spreading code do not change during the transmission.^{1}
In the implementation described in this artic le, symbolrate combining was used since it requires fewer computations, especially when the spreading factor goes up. Channel estimation is hard to achieve in chiplevel combining. Usually, it is estimated at the symbol level and interpolated to the chip rate. Therefore, chip rate channel estimation takes more MIPS and memory than its symbol rate counterpart. In Table 2, symbol and chiprate combining are also compared in terms of memory usage. In chiprate combining, we can see that one path is stored (after the MRC in), but the stored data is in chiprate, which is as many as 38400 samples. In symbolrate combining, the data of multiple paths are stored (Figure 1 before MRC). However, each path is at symbol rate. Therefore, the total number of samples are (38400/SF) * L. The difference in memory usage between chip and symbol combining can be given by equation:
where L is the number of fingers (2 to 8) and SF is the spreading factor (4 to 512). If Δ in Equation 13 is positive, chiprate combining requires more memory, if Equation 13 is negative, then the symbolrate combining requires more memory. As mentioned at the outset of the article, we're considering the implementation of a fourfinger RAKE receiver. MRC is just a complex multiplication with a channel coefficient in each finger, followed by addition of the results from the different fingers. These steps can be combined using MAC instructions. The Results
Figure 5 shows the tracking of the fading envelope for two frames of data for the weakest path of the following channel (no MAI is assumed).
Reducing the size of the MA filter from N=16 taps to N=8 taps, the channel estimate gets noisier as expected, but having less number of taps helps in tracking the fast fading envelopes. Figure 6 shows the performance of the channel estimation algorithm in tracking fast fading channel (for the same channel condition the mobile is traveling at twice the speed). Shorter length MA filter tracks the fast fading channel better.
The length of the MA filter is left as a variable for the designer's choice. It is seen that to estimate the fast fading channel reducing the filter length helps track the channel better when the envelope changes fast. The number of DSP cycles needed for channel estimation can be expressed by: DSP_cycles_interp= (SF_MATCH*150) + (7N/4) + 780 Figure 7 shows the result of the path searcher for the following channel condition: SNR=5dB;Path_delay_ns=[0 260 521 781]ns; power_dB=[0 3 6 9]dB; v=120; km/h As Figure 7 shows, the path searcher is able to identify the four different paths that are 1 chip apart.
Table 2 above shows the complexity and memory requirements of chip level and symbol level combining. The MIPS requirements for the chiplevel and symbollevel combining are also shown in Figure 8 for different spreading factors.
Finally, let's look at the biterrorrate (BER) curves in Figure 9 for the WCDMA receiver that was implemented in the DSP.
The plot shows that the channel estimation and the path searcher algorithm produces results that are very close to the ones obtained assuming ideal channel estimation and known paths. It wa s observed that averaging the power delay profile for longer period provides good estimation of the path delay but long time averaging has the disadvantage if the delay spread changes fast. Wrap Up References
About the Authors KimChyan Gan is a DSP software engineer at Motorola DSP Platforms group. He received the M.S. degree in electrical engineering from the Utah State University, Logan, and can be reached at kimchyan.gan@motorola.com. Imran Ahmed is a systems applications engineer at in Motorola's DSP Platforms group. He received a B.S. in Computer Engineering from University of Texas at Austin and can be reached at imranahmed@motorola.com. 
Home  Feedback  Register  Site Map 
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. 