By Shai Shpigelblat, Manager of processor and core architecturesCEVA With the consumer demand for increased content and as a result, increasing high data bandwidth continuing to drive communications systems, coding for error control has become extraordinarily important. One way to improve the Bit Error Rate (BER), while maintaining high data reliability, is to use an error correction technique like the Viterbi algorithm. Originally conceived by Andrew Viterbi as an error-correction scheme for noisy digital communication, the Viterbi algorithm provides an efficient method for Forward Error Correction (FEC) that improves channel reliability. Today, it is used in many digital communications systems in applications as diverse as LTE Physical Downlink Control Channel (PDCCH), CDMA and GSM digital cellular, dial-up modems, satellite, deep-space communications, and 802.11 wireless LANs. It is also commonly used in speech recognition, keyword spotting and computational linguistics.
Given the increasing need for an error correction technique like the Viterbi algorithm, it has become critical for engineers to more fully understand what the technique is and how it can best be utilized in today’s digital communications systems. Let’s take a closer look.
FEC in Digital Communications Systems In a typical digital communications system, a stream of bits (information) is transferred from one point to another through a communication channel and is therefore susceptible to noise. FEC techniques improve the channel’s capacity by carefully adding redundant information to the data being transmitted through the channel. In a FEC system, the transmitted data is encoded in such a way so that the receiver can correct, as well as detect, errors caused by channel noise (Figure 1). A typical digital communications system with FEC might, for example, display the following message when an error is detected: Error! Reference source not found.
Convolutional encoding with Viterbi decoding is a FEC technique that is well suited for use with channels where the transmitted signal is corrupted mainly by Additive White Gaussian Noise (AWGN). The convolutional encoder inserts redundant information bits into the data stream so that the decoder can reduce and correct errors caused by the channel. However, these extra bits increase the data rate (bits/s) and as a consequence, also increase the bandwidth of the encoded signal. Using a technique known as puncturing, the data rate can be dynamically changed according to the channel quality. The technique works by deleting symbols from the encoded data and thus, reduces the coding rate.
Figure 1: FEC is typically used in digital communications systems to enable the receiver to detect and correct errors without having to ask the sender for additional data. Understanding the Viterbi Algorithm Before learning how to implement the Viterbi algorithm, it is first critical to gain clearer insight into how it works. The algorithm is comprised of an encoder and a decoder. The encoder algorithm is generally straightforward and is usually implemented by dedicated hardware. In contrast, the decoder is much more complicated and consumes many more cycles.
The convolutional code used by the Viterbi algorithm is defined by two parameters: code rate and constraint length. The code rate, R=k/n, is expressed as a ratio of the number of input bits into the convolutional encoder (k) to the number of channel symbols output by the convolutional encoder (n) in a given encoder cycle. The range of R is between 1/8 and 7/8. A small value for the code rate indicates a high degree of redundancy, providing higher effective error control at the expense of increasing the bandwidth of the encoded signal. The constraint length, K, is the number of input frames that are held in the K-bit shift register.
Figure 2 describes a convolutional encoder with R=1/3 (k=1, n=3) and K=9 using the 755,633,447 convolutional code. The octal numbers 755, 633 and 447 represent the code generator polynomials corresponding to the shift register connection to the modulo adder. Three polynomials define the outputs of the encoder described in Figure 2:
G0(x) = 1+x2+x3+x5+x6+x7+x8, denoted as 00001111011012 (755)8
G1(x) = 1+x+x3+x4+x7+x8, denoted as 00001100110112 (633)8
G2(x) = 1+x+x2+x5+x8, denoted as 00001111011012 (447)8
Polynomial selection is important because each polynomial has different error-correcting properties. One code can have an error protection property that is completely different from another code.
Figure 2: An example of a convolutional encoder with the constraint length equal to 9. The convolutional encoder provides knowledge of the possible data when moving from one stage to the next. The polynomials that describe the convolutional encoder behavior, define a dependency in the encoder outputs between the current state and the next state. As an example, consider the simplified encoder in Figure 3 which outputs two bits for each bit being shifted in. The encoder’s code rate is ½. It has a constraint length of 3.
Figure 3: A simplified convolutional encoder structure with K=3 (7, 5).
In this example, the total number of states (2(K-1)) are four: 00, 01, 10, and 11. The encoder implementation is simple and can be implemented by external hardware. Table 1 describes the states of the encoder for each combination, where the following items describe the table headers:
- Input Data—an input bit that is shifted into the encoder
- Current State—the state of the encoder, which can be one of four possible states
- Output Bit—the output of the encoder, which is unique to each polynomial
- Next State—the next state as a result of the shifted input
Input Data | Current State | Output Bits (G0G1) | Next State |
0 | 00 | 00 | 00 |
1 | 00 | 11 | 10 |
0 | 01 | 11 | 00 |
1 | 01 | 00 | 10 |
0 | 10 | 10 | 01 |
1 | 10 | 01 | 11 |
0 | 11 | 01 | 01 |
1 | 11 | 10 | 11 |
Table 1: K=3 (7, 5) Encoder Look-up Table Figure 4 illustrates one Viterbi stage that includes four states (00, 01, 10, and 11), covering all possible transitions. The encoder state machine is based on Table 1, with the red narrows representing transitions due to input data 0 and the blue arrow representing transitions due to input data 1. Each of these input data bits cause the encoder to change its state.
Figure 4: A convolutional encoder is a finite state machine that, in this case, is defined by K=3 (7, 5). The dependencies between the current state and the next state of the encoder in Figure 3 are illustrated in the trellis diagram in Figure 5. The output of the encoder is called the encoded sequence or code word. Its length is equivalent to the number of stages, tX, where X=1, 2, 3, etc… While the code word is defined differently in each system, the longer it is (up to a limit) the better the error correction. Usually the code word is four or five times the constraint length.
Figure 5: Trellis diagrams, in this case for the K=3 (7, 5) convolutional encoder, represent the linear time sequencing of events. The Viterbi algorithm is based on the Maximum-Likelihood decoding technique. The main purpose of the decoder is to select the code word with the minimum distance between the received signal and the code word. The Viterbi algorithm can use either hard or soft decisions. Hard decisions are based on Hamming distance and while they are simple to implement, do not take probability into consideration. Despite this, the Hamming method is efficient in situations where the received signal has a magnitude close to the antipodal values (e.g., 0 corresponds to +1v and 1 corresponds to –1v). Still, when the received signal is close to 0v the hard decision is not the best choice.
As opposed to hard decisions, the metric used by soft decisions is the Euclidean distance or local distance. With the soft decision method, the probability that the decision is correct may be divided into more than two regions. The receiver uses the channel's characteristic to make a decision according to the probability that the received voltage is correct. The probability calculation is done based on the knowledge of the communication channel (normal distribution). In digital communications systems, more information leads to a better decision. The soft decision method improves the reliability of the receiver and, in the case of eight regions of soft decision, can dramatically reduce the bit error by up to 3 dB.
Implementing the Algorithm To better understand how to use the Viterbi algorithm, consider the implementation of a softdecision Viterbi on the CEVA-TeakLite-III, a dual-MAC, high-performance fixed-point DSP core architecture based on previous-generation CEVA-TeakLite and CEVA-Teak architectures. CEVA-TeakLite-III targets mid- to high-end telecommunications and consumer electronics applications where low power and portability are major requirements. It is designed to run the soft decision decoder algorithm, achieving maximum performance without having to pay the cost of dedicated hardware.
The decoder algorithm is implemented in four phases: initial state, branch metric calculation, Add Compare Select (ACS), and Viterbi traceback.
Initial State In the initial state, the Viterbi parameters and pointers are initialized to the buffers located in memory (Figure 6). There are five buffers used by the Viterbi algorithm, including:
- Soft decision inputs buffer—contains 16-bit information regarding the receiver signal’s reliability and probability. Note that each bit sent by the receiver is represented in a 16- bit value.
- Branch metric buffer—contains the local distance calculation for all possible inputs. Its size is 2n.
- Path metric buffer—contains the previous metric state accumulated with the current metric. Its size is 2k-1. This calculation is performed for each possible state.
- Trellis buffer—contains the survivor’s path indications for each stage.
- Code word buffer—contains the reconstructed code words.
Figure 6: An example of a memory map and pointers. Branch Metric Calculation In this phase, the Viterbi decoder calculates all possible branch metrics (the normed distances between every possible transition). Each branch metric value is the Euclidean distance of the softdecision inputs. It is also referred to as the local distance calculation. In better understand the DSP operation, consider that the local distance equation is given by:
Here, SDn refers to the soft-decision inputs, Gn is the expected value for each path (antipodal values), j is the path indicator, and r is the inverse of the code rate. From equation (1):
Since SDn2 and Gn(j)2 are positive values that are common to all the paths, they can be eliminated. As a result Equation 2 becomes:
Since the local distance is a minimum value we can remove the scalar value -2 and find a maximum. In order to reduce the DSP calculation and complexity, Gn is defined as signed antipodal values, meaning that 0 corresponds to +1 and 1 corresponds to –1. With this representation the calculation becomes a simple add and subtract operation. Table 2 illustrates the local distance calculation for various code rates.
Table 2: Local Distance Calculation According to Code Rate The CEVA-TeakLite-III has dedicated instructions intended to accelerate the branch metric calculation. Additionally, the Arithmetic Logic Unit (ALU) offers split operation, making it possible to perform two add or subtract operations in one cycle.
For R=1/n, there are 2n possible branch metrics. However, due to the Viterbi butterfly structure, only 2n-1 is required. As shown in Table 3, the “Error! Reference source not found.” error code describes the CEVA-TeakLite-III instructions which may be used in the branch metric calculation.
Table 3. Branch Metric Instruction As an example, consider the branch metric calculation assembly code for R=1/3. The r0 and r1 pointer registers point to the soft decision inputs buffer. The r7 pointer register points to the branch metric buffer. As illustrated in Figure 7, this phase takes eight cycles.
Figure 7: Shown here is the branch metric calculation assembly code for R=1/3.Add Compare Select (ACS) Because the Viterbi algorithm is based on maximum-likelihood decoding, the decoder must calculate the metrics of the entire possible path. The calculation is based on the previous state metric of each node (2k-1 nodes) and all possible paths, and constitutes the "add" part of the ACS phase (also known as the path metric update). For an encoder with a code rate of 1/n, there are two paths from each previous state represented by the branch metric. Each node in the new state is built from two potential metric paths. The path with the higher metric is selected and the other one is eliminated. Concurrently, the decoder indicates the survivor path with a decision bit. The decision bits are used to track the final survivor path. This task comprises the compare select from the ACS phase. Since these computations are the primary task of the decoder, most of the MIPS are consumed by this phase.
The CEVA-TeakLite-III performs a complete Viterbi butterfly in two cycles. One butterfly operation includes the calculation of the two new states using the ACS computation.
The CEVA-TeakLite-III Viterbi ACS instructions are based on the Viterbi butterfly structure and symmetry. The structure is called “butterfly” due to its physical resemblance to the animal. It is also referred to as the radix-2 trellis structure since usually for a 1/n code rate, each group contains 2 source and 2 destination states. Figure8 depicts a single Viterbi butterfly with a 1/n code rate. Each state leads to two states. These two states complement each other such that if the first path is G0=1 and G1=0, then the second path will be the inverse of G0 and G1, or G0=0 and G1=1. This means that the two values are always equal but differ in sign. This symmetry, which is generally forced by the polynomial’s character, helps simplify the calculation.
Figure 8: Pictured here is the Viterbi butterfly structure for a 1/n code rate. Since a state is represented by N, for K=3 there are a total of four states. Given this description of the butterfly structure, it is possible to look more closely at the “add” and the “compare select” tasks in the ACS phase. Recall, that the “add” phase is used to calculate the new path metric for each state. The Viterbi add is accomplished by adding or subtracting the local distance (represented by the branch metric) to the old path metrics (PMs) to create new PMs. Four operations are required to calculate one complete butterfly. However, due to the butterfly symmetry, only three values are needed: the PM of state 2N, the PM of state 2N+1 and single branch metric (BM) value per butterfly.
The CEVA-TeakLite-III uses a dedicated instruction and split adders to perform the four required operations (two add operations and two subtract operations) in one cycle. Table 4 describes the “add” phase dedicated instruction that is used to calculate the new PM's in one butterfly. Essentially, the two original PMs combine with the BM value to calculate the next two new PMs (Figure 9).
Table 4: Add Phase Instruction Figure 9: The Viterbi butterfly add for a code rate of 1/n. Assuming N=0, the old path metric states are pm0 and pm1. The pm(2N) and the pm(2N+1) are located consecutively in the path metric buffer, indicated by the r2 pointer register. Instead of saving the positive and negative values, the BM pointed by the r7 pointer register is duplicated in the memory, explaining why the addsub2w instruction is used instead of add4w. The “add” phase instruction is repeated 2K-1/2 times for each stage, calculating 2*2K-1 new optional PMs. Note that the number of stages is dependent on the code word length. Figure 10 illustrates one Viterbi butterfly “add” phase using the instruction addsub2w.
Figure 10. The assembly code for one Viterbi butterfly “add” phase, using the addsub2w instruction, is depicted here. The Viterbi “compare select” phase is used to choose the minimum local distance between two options. The survivor value is then saved to memory. In addition, this phase indicates the survivor path with a dedicate indication bit represented by predicate bits (prX, where X=0,1,2). The survivor paths are collected to one or two registers in order to create a trellis word and saved in memory at the end of each stage. The Viterbi “compare select” instruction is performed by a single cycle, and can be done by a dedicated instruction as described in Error! Reference source not found.. The vcs instruction is defined for any K. The 2w switch is used for K=5, while for bigger K the dw switch is used.
Instruction OperationTable 5: The compare select phase instruction is performed by a single cycle using the instruction cited here. The vcs instruction compares the two optional values for each PM and selects the maximum value using the split compare unit. The maximum selected value for the PM(N) state is saved to the low part of the destination accumulator, while the maximum selected value for the PM(N+2K- 1/2) state is saved to the high part of the same destination accumulator. In this state, the destination accumulator contains the two survivor codes of the two states, PM(N) and PM(N+2K- 1/2). The indication bit, which indicates the survivor path, is stored in a shift register to memory at the end of each stage. The Viterbi “compare select” phase instruction is repeated 2K-1/2 times for each stage, calculating 2K-1 new PMs (Figure 11).
Figure 11: One Viterbi compare select phase using the vcs{dw} || st || st instruction is illustrated in this graphic. Viterbi Traceback The traceback phase is used to build the correct code word by tracing the maximum likelihood path backwards through the trellis according to the footprint of the previous state. The traceback phase requires much less MIPS from the DSP, compared to the ACS phase, and is dependent on the size of the code word. Consequently, the longer the trellis (e.g., ten times the constraint length, depending on the application), the more accurate the resulting code word reconstruction.
The traceback always starts from an already known state (state 0). This condition is forced due to a special tail that brings the path back to state 0. The traceback operation is performed by a single generic instruction that fits all K's. Table 5 describes the instructions that are used for the traceback phase.
Table 6: Traceback Instructions Figure 12 illustrates the traceback operation for K=3. Each trellis code contains the entire survivor indications for each stage. When K=3, there are four states in each stage: 00, 01, 10, and 11. In addition to the correct code, the code word indicates the states forced by the encoder.
Figure 12: A traceback operation for K=3. Assuming the traceback operation starts from state 00 in t5, the code word is all zeros (default state) and therefore points to state 00 (Figure 13). The ACS phase already chose the most likely path from the two options and indicates it as 0 (the upper path has the most likelihood). Therefore, 0 is shifted to the shift register and the code word points to the next stage (t4) to state 00. In this stage, the lower path has the most likelihood and therefore 1 is shifted. In the next stage (t3), the code word points to state 01. Here, the lower path has the most likelihood so the next stage will be 11, and so on until the entire code word is reconstructed.
Figure 13: A backwards traceback operation for K=3. The CEVA-TeakLite-III is designed to support any K starting for K=5. Figure 14 illustrates a shlinw instruction operation when K=5, while figure 15 describes the assembly code for K=5 using the shlinw instruction. Note that the r3 register pointer points to the trellis code buffer and the r0 register pointer points to the code word buffer.
Figure 14: The traceback operation for K=5.
Figure 15: In this traceback instruction for K=5, it is assumed that there are 32 stages. Figure 16 illustrates the shlinn2dw operation when K is bigger than 6. In this case, the CEVATeakLite- III uses all of the bandwidth as well as 64 bits from memory. For K=7, which contains 64 states, the shlin2dw instruction is required to complete an operation similar to that for K=5. For K bigger than 7, some manipulation on the code word is required.
Figure 16: The traceback operation for K bigger than 5. Conclusion The Viterbi algorithm is today garnering widespread attention for its use in digital communications systems. Providing an efficient method of forward error correction, it not only improves BER but also increases channel reliability. By clearly understanding the algorithm and the steps involved in its implementation, today’s engineers will be better equipped to utilize it to its fullest potential. The CEVA-TeakLite-III DSP provides a prime example of how the implementation of this algorithm can result in the realization of a device’s maximum performance.
About the Author Shai Shpigelblat is manager of processor and core architectures at CEVA. He has nine years of experience in the electronic and silicon industry. In his rule, Mr. Shpigelblat is responsible for defining and managing the wireless application processors such as CEVA-XC161, CEVAXC321 and also various general purpose processors such as CEVA-TeakLite-III and CEVAX1641. Previously, Mr. Shpigelblat worked at DSP Group, started at 2000 as a VLSI design engineer. Mr. Shpigelblat holds a B.Sc. in Nuclear and Electrical Engineering from the Beer- Sheva University.
References [1] P.K. Singh, S. Jayasimha, "A Low-Complexity, Reduced-Power Viterbi Algorithm," vlsid, p. 61, 12th International Conference on VLSI Design - 'VLSI for the Information Appliance', 1999.
[2] H. Hendrix, Viterbi Decoding Techniques in the TMS320C54x Family, Texas Instruments, June 1996.