By Sayandeep Nag, Shyam Shenoy, Chandrashekar B U, Synopsys
The implementation of any Wireless Base band system as an IP is complex. One of the key components in the IP is the Interleave / DeInterleave process, which requires a careful implementation to obtain an optimal solution in terms of area and speed. The Interleave / DeInterleave process can consist of multiple stages and the approach of implementing these stages individually is not optimal. In this paper we present a novel method of implementing these individual stages as a combined Interleaver and DeInterleaver. This is achieved by examining the transfer function of each of the individual Interleavers and deriving the optimal solution by obtaining a combined transfer function. We finally compare the implementation of the UWB combined Interleaver / DeInterleaver with the traditional method of implementation and summarize the results.
Introduction to Interleaver
Digital communications are prone to random burst errors. Interleaving is used for error correction in block codes to make them effective in a burst noise environment. The interleaver spreads out adjacent data over multiple blocks of data. Any burst noise occurring will thus be reflected on the receive side decoder after DeInterleaving as independent random symbol error which is more manageable than burst errors. Interleavers can be classified as either periodic or pseudo random. Periodic interleavers order the data in a repeating sequence of bytes. Block interleaving is an example of periodic interleaving. This interleaver accepts the symbols in blocks and performs the necessary operations over data. Pseudo random interleaver rearranges the data in a pseudo random sequence. Periodic interleaving is commonly invoked as it is easily accomplished in hardware.
Introduction to UWB Interleaving
Ultra-Wideband (UWB) is a technology for transmitting information spread over a large bandwidth. The bit interleaving process is specified as three distinct stages,
Symbol Interleaving, which permutes the bits across 6 consecutive Orthogonal Frequency Divisional Multiplexing (OFDM) symbols.
Intra Symbol Tone Interleaving, which permutes bits across an OFDM symbol
Intra Symbol Cyclic Shifts, which cyclically shifts the bits in successive OFDM symbols by deterministic amounts.
Fig 1 : Stages of UWB Bit Interleaver
The additional parameters needed by the interleaver are listed in Table-1 for the different data rates supported by UWB
Table 1 : Parameters for UWB Interleaver
Considering these parameters the symbol, tone and cyclic interleaving for UWB can be denoted by the following equations.
UWB Symbol Interleaving:
UWB Tone Interleaving:
UWB Cyclic Interleaver:
where represents the floor function, and
Traditional Implementation Strategy applied to UWB Interleaver
The simplest method to implement the Interleaver / DeInterleaver is to write in the data to a memory and then read it out based on the required output pattern. Considering the UWB interleaver requirements, we need to have three different stages of memory corresponding to the three different Interleaving / DeInterleaving processes. The memory required for the first stage Symbol Interleaving should be at least of the size of 6 OFDM symbols. Accordingly, as per the parameters in Table 1 the first stage memory should be 6 * max(Ncbps) = 1200 bits. The second stage Tone Interleaver processes works on the incoming OFDM symbols individually, thus making the memory requirements to be max (Ncbps) = 200 bits, similarly the third stage Cyclic Interleaver also requires 200 bits of storage.
UWB having applications in real time system domains requires the data to be continuously processed. When the first stage, i.e. Symbol Interleaver, is reading out the first set of interleaved 6 OFDM symbols, the next set of 6 OFDM symbols need to be stored in a separate memory to maintain continuous flow of data. This gives a requirement for a duplicate memory, each of which is alternatively written and read. The duplication is required for each individual interleaving stage. Table 2 identifies the total memory requirement for the individual stages of the traditional interleaver for UWB.
Table 2 : Memory Requirement
Fig 2 shows the traditional implementation using individual three stages of the UWB interleaver. We require six memories, two for each interleaver stage. The memory input and output bit widths is generally constrained based on the requirements of the neighboring blocks. DPRAM memory would be a better choice as the read and the write operation can take place in parallel.
Fig 2 : Individual Implementation
The memory will need write and read address generators, which will generate different, read / write addresses. Further in UWB, these addresses vary based on the data rate of transmission. Implementation of the address generators will result in a huge amount of logic especially in multiplexers based on data rate, and high fanout nets. Also to note is that a similar logic needs to be replicated for the DeInterleaver. Finally it results in a Transceiver that will have replicated logic.
The Interleaver / DeInterleaver implemented as three individual stages have a latency of eight OFDM symbols. This is a result of six-symbol latency in Symbol interleaver and one symbol each in tone interleaver and cyclic interleaver respectively. The UWB receiver has strict latency requirements and therefore any improvement in latency is extremely important.
Combined Interleaver Implementation Strategy
Combined interleaver is essentially a transfer function containing the functions of all the individual stages, to obtain a single stage. It is possible to mathematically derive a single equation to represent the combined interleaver transfer function and then derive a hardware implementation of it. But this approach also will not lead to an optimal solution. Instead, identifying good memory architecture is the most important factor for best possible implementation for the combined Interleaver / DeInterleaver. This memory architecture needs to meet the storage requirements for all data rates and also satisfy the format in which the data is fed in and out of the block. It is also important that the design has the least possible latency. The fanout of the nets also needs to be low in order to achieve a smaller design.
The logical approach for the Interleaver / DeInterleaver design would be to start with the optimal bit width based on not just the interleaver, but also the requirements of the neighboring blocks as mentioned earlier. Once the bit widths of the interleaver input data is identified, the memory structure can be formulated as MemDepth = (max (Ncbps) * 6) / (Input Data BitWidth). Example if the input data bit width is required to be 8 bits wide the depth of the memory can be calculated as 1200 / 8 = 150.
Fig 3 : Combined Interleaver
The memory architecture chosen will have a sequential write and a read based on a pattern governed by the interleaver transfer function. It is not easy to get the data read address for the combined three stages of the interleaver manually. This can be instead done by developing a C program for all the three interleaver stage functions and give an incremental data pattern as input. The resultant output of the C program should give the read data pattern for all the individual data rates, which can be used as data read address straightaway.
Implementing the above mentioned approach in hardware is not very complex, for the example considered the memory needs to have 150 8-bit individual write data and read data lines. This can be done by using a flip flop based memory. The incoming data can be written in the memory by demultiplexing the write data line based on the write address generated by the address generation logic. Once all the 6 OFDM symbols to be interleaved are written, the reading out process can begin. The complete 6 OFDM symbols are available on the individual read lines of the memory structure. Concatenating and clocking the required data pattern will give the 6 OFDM symbols in a combined interleaved form.
clka clkb enable wr_data rd_data wr_addr rd_addr WR Control Logic RD Control Logic mem odd mem even Mode. DataRate and Other Control
Fig 4 : Block Representation of Combined Interleaver DeInterleaver
Combined DeInterleaver Implementation Strategy
Similar to Interleaver, the DeInterleaver will also be single stage. It is based on the same memory which is used for the Interleaver, taking advantage of the facts that in UWB either only transmit or receive path is active at any given point of time. The input (write) address to the DeInterleaver will be same as the Interleaver output (read) address for a particular data rate. This address can be used to write into the memory. Sequential read out is employed. The advantage of having this scheme as opposed to sequential write and read based on a look up table will be the saving of an additional look up table required for generating the read address. Additional multiplexing between the Interleaver and the DeInterleaver write data to the memory is required. Similarly the final read data out between the Interleaver and the DeInterleaver is needs to be multiplexed.
The objective of this section is to describe in detail the Combined Interleaver / DeInterleaver implemented strategy with an example.
Fig 5 : Block Placement of Combined Interleaver / DeInterleaver
Fig 5 gives the relative placement of the combined Interleaver / DeInterleaver with respect to the TX and RX Chain blocks. The UWB Transceiver architecture allows the block to be shared between both the chains.
In the TX chain of the UWB, interleaver gets its input from the Convolution Encoder / Puncturer , and gives out data to the QPSK or the DCM Mapper . The design required the Interleaver to have a 30 bit data input from the Convolution Encoder / Puncturer and a 20 Bit data output to the QPSK or DCM Mapper. Similarly on the RX chain the design requirements were to have a 20 bit data interface with the QPSK and DCM DeMappper and a 30 bit data interface with the DePuncturer block.
Considering the above design requirements it was decided to have a flip-flop based memory as shown in Fig 6. The memory consists of four one bit registers with a two bit addressing line. Twenty such register files are instantiated together to make a memory row structure. Out of the four bits used two bits are used for the odd bank of memory and the remaining two bits are used for the even bank of memory.
Fig 6 : Memory Row Structure
Thirty such memory row structures are instantiated together to complete the memory requirements of the combined Interleaver / DeInterleaver. The final memory structure consists of 30 individual data lines each 20 bit wide to write into the memory, 30 data lines each 20 bit wide to read from the memory, read and write address busses two bit each as the read and the write address busses of the thirty memory rows are shorted, 20 bit write enable bus the individual write enable busses of the memory rows are shorted. Fig 7 gives the structure of the memory.
Fig 7 : Memory Bank Structure
In the case of the Interleaver operation the data from the Convolution Encoder comes on a 30 bit bus. The incoming data is controlled by a binary counter. Based on value of the binary counter and a ring counter to generate the write enables, each of the incoming 30 bits are written on to the individual rows similar to filling up a matrix column wise. The data counter value varies between data rates. In case of lower data rates like 53.3, 80, 106.7, 160 and 200 Mbps, 6 OFDM symbols consist of only 600 bits, which are stored in the lower-half of the memory. Hence the address remains same throughout the write process. In case of higher data rates i.e. 320, 400 and 480 Mbps 6 OFDM symbols consist of 1200 bits. Hence the address changes for the bank being used every alternate cycle of incoming data. The change in address is dependent on the data counter. The readout process begins after the 6 OFDM symbols are completely written to the memory. Since each of the memory row has its own 20 bit read data bus, the complete 6 OFDM symbols stored in the memory is available on the 30 individual read data busses.
To derive the read address pattern, the incoming 6 OFDM symbol write address pattern is mapped on to a spreadsheet. The C program for the combined interleaver gives the final interleaved address. Matching the write address pattern from the C program with the spreadsheet will give the memory locations of the 20 bits to be read out based on a read counter. Concatenating these 20 bits for the respective read count values gives the data in interleaved form. The read count value varies for each of the data rate similar to the write process.
For lower data rates, the address remains same throughout the write process and for higher data rates, the address changes for the bank being used every alternate 20 bits of outgoing data.
In case of the DeInterleaver, the interface configuration and the memory architecture remains the same. Since the data from the QPSK and DCM DeMapper comes at 20 bits at a time it is written into each row based on the write count. As in the case of Interleaver, for lower data rates, address remains same throughout the write process and for higher data rates the address changes for the bank being used every alternate 30 bits of incoming data. Similar to the Interleaver the readout process begins after the 6 OFDM symbols are completely written to the memory. The contents of the memory are available on the 30 individual read data busses.
To derive the read address for the incoming 6 OFDM symbol the interleaver write address is mapped to a spreadsheet. Matching the interleaver input write address with the spreadsheet will give the memory location of the 30 bits read out based on the read counter. Concatenating these 30 bits for the respective read count values gives the data in the deinterleaved form. Similar to the write pattern the address location of the memory changes for alternate 30 bits of data.
The individual Interleaver and DeInterleaver design is completed. Since each data rate will have individual addresses for the Interleaver and the DeInterleaver, we need to multiplex these addresses with data rate as the select signal. What remains now is multiplexing the write data, read data; write enable and the memory enable signals based on the whether the TX Chain or the RX chain is operational. This can be achieved by having a control signal at the top level as the multiplex input. The design elaborated above can be generalized for any form of input and output data width requirements for the Interleaver or the DeInterleaver operation.
Disadvantages of the Design
The major design disadvantage is that the memory read data bus will have high fanout, even though the write data bus has no fanout. Synthesis tools can take care of this high fanout by replication of logic. This replication is not feasible beyond a certain limit. Also, arriving at a read address matrix is not simple when all the data rates have to be supported and takes a lot of design cycle time.
The table below gives our results as a comparison with the traditional approach, for Xilinx Virtex 4 FPGAs.
Comparison criteria Traditional approach Combined approach Gate count 14K LUTs 10K LUTs Memory requirement 3200 2400 Latency 8 OFDM symbols 6 OFDM symbols
Table 3 : Memory Requirement
The gate count reduces since three set of multiplexers, each for the 3 stages of interleaver in traditional approach are replaced by a single set. The memory requirement will be 3200 or 6400 bits in the traditional approach based on whether the Interleaver and DeInterleaver are separate or combined. The latency is reduced since we don’t have accesses into multiple memories.
We have discussed a traditional interleaver implementation method and its limitations in terms of hardware and memory requirements when applied to multiple interleaver stages. We have alternatively suggested a method of arriving at a more optimal Combined Interleaver and DeInterleaver and also presented a design example to compare the two options.
 High Rate Ultra Wideband PHY and MAC Standard ECMA-368, 1st Edition / December 2005.
 Eric Tell, Dake Liu, “A hardware architecture for a multi mode block interleaver”, ICCSC ‘04