Venkatasubramanian V, Kulandaivel P, Dimitri Dey (Tata Consultancy Services)
Heterogeneous networks such as a mix of macro and small cell (pico, femto) base-stations enable flexible, low-cost deployments. They provide a uniform broadband experience to users anywhere in the network. However, multiple form factors of base station solutions have their own challenges in physical layer development. The conventional DSP architectures cannot meet these demands. Hence platform providers are coming up with new, more efficient System on Chip (SoC) solutions to address these demands. Additionally, it is difficult to maintain different versions of physical layer software for different form factors, which can result in significant modification and rework.
This paper describes an efficient, forward-looking architecture that enables handling of various form factors of LTE base stations with minimal software modification and without architecture changes. The architecture proposed allows easy migration to the next generation SoCs as well as to more powerful SoCs of the same generation. Our implementation of this architecture has now migrated two generations of SoCs for LTE Release 9 small and macro cells, and is ready for migration to a multi-sector, LTE-A macro base station.
Population of smart phones, tablets and laptops are increasing data and video content that require more bandwidth (BW) and better quality of service. Wireless subscribers expect high-speed access anytime, anywhere. Current wireless standards, such as 3GPPs (3rd Generation Partnership Projects) LTE (Long Term Evolution), can meet the data rate required by data hungry applications. However, LTE radio access is reaching the limits of Shannon's theorem – suitably adapted for Multiple Input Multiple Output (MIMO) channels – of increasing the BW and improving the signal to noise ratio to increase the channel capacity. Therefore, the spectrum available for mobile data applications is increased through carrier aggregation in LTE-Advanced. Another solution for increasing overall mobile network capacity is to increase the carrier-to-interference ratio while decreasing cell size and deploying small cell technologies. LTE Advanced is about improving spectral efficiency per unit area.
One of the key components that affect the throughput of such networks is the physical layer (PHY), especially the baseband. LTE and LTE-A base station (eNodeB) baseband design challenges such as higher data rates and lower latency have compelled designers to adopt design methodologies that predominantly include a heterogeneous system designing.
Heterogeneous architectures consisting of multiple Digital Signal Processor (DSP) cores, cores for higher layer processing and hardware (HW) accelerators are being developed to fulfill the wireless broadband standard requirements. Heterogeneous implementations offer the simultaneous benefits of flexibility and programmability in the DSPs, and increased performance (higher throughput and lower power) in the HW accelerators. Several advanced, System on Chip (SoC) solutions employing such heterogeneous architectures are available for macro, micro, pico and femto LTE eNodeBs’ development      .
These SoCs are considered in the next two sections: Section II discusses current trends in base station SoCs and Section III traces the evolution of such SoCs for 4G LTE. In Section IV, we present our forward-looking architecture that can be easily ported onto future generations of SoCs. Sections V and VI provide examples of how this porting can be achieved for LTE small cells and LTE-Advanced (LTE-A) macro cells respectively. The real-time implementation results corresponding to our current implementation and its projected evolution that keeps up with the SoCs are given in Section VII. Section VIII includes the conclusion.
II. CURRENT TRENDS IN BASE STATION SOCS
In the early stages of evolution of a wireless standard (exemplified by LTE, CDMA2000, WCDMA and so on), it is common to find its physical layer for a base station being implemented on a board with multiple DSPs and FPGAs. These boards are typically general purpose boards and will enable Software-Defined-Radio (SDR) implementations of multiple standards.
Figure 1: A wireless base-station architecture
Correspondingly, the “higher layers” (HL), that is, Layer 2 and Layer 3, for such standards are implemented on other suitable boards, which typically have an ARM or Freescale processor. Again, the board is not limited to any particular wireless standard. The RF subsystem is a separate subsystem and is not considered in detail in this paper, except to say that there is some high-speed serial interface (of differential signals) between the RF board and the baseband board. This architecture is captured in Figure 1.
Indeed, until now, this is a common architecture for test and measurement equipment too. We also note that the processing power required for wireless applications is very high and the chosen processors have to be suitable, high-end ones.
In parallel, based on Moore’s law, SoC solutions have resulted from putting together ASICs and heterogeneous processors. Further, the ASICs and processors have kept growing in processing power. It is now inevitable that the ASICs cater to wireless technologies and it is hardly surprising that current SoCs have the wherewithal to perform all the processing needs of a multi-sector base station. What was done on several boards with several processors can now be compressed into a single SoC at a fraction of the cost of all the boards needed for equivalent functionality (even after correcting for current prices). And, the SoC will draw a fraction of their total power. We consider the evolutions of such SoCs in more detail in Section III. However, at this point, we summarize by saying that typical SoCs for wireless base stations consist of the following:
- Standard-independent processing cores for the physical layer (DSP cores)
- Standard-independent processing cores for the higher layers (HL cores)
- Standard-dependent accelerator ASICs for data processing for the physical layer and higher layers. Note that there may be more than one standard that is catered to. The interface presented by these accelerators will allow the choice of the standard, but it is rare to find SoCs that perform complete data processing for earlier standards such as GSM.
- Industry-standard RF interfaces (CPRI, OBSAI etc.)
- Industry-standard inter-board interfaces such as Ethernet, SRIO, PCI-express. These are used for connecting to the backhaul in a base station.
- Peripherals such as the DMA engine, memory controllers
III. EVOLUTION OF SOCS FOR 4G SYSTEMS
In this section, we distil the general trend in evolution of SoCs for 4G systems from the physical layer point of view. Figure 2 shows the Physical Downlink Shared CHannel (PDSCH) and the Physical Uplink Shared CHannel (PUSCH) chains in an eNodeB (except the channel estimation block)   . A list of blocks in increasing order of computational complexity and decreasing order of configurability can be made as follows:
- DL symbol rate processing (scrambling to resource mapping)
- UL soft bits processing (de-mapping to LLR generation)
- UL symbol rate processing (channel equalization and IDFT)
- Turbo Encoding and rate matching
- Cyclic Prefix addition/removal
- Half-subcarrier shift removal
- Turbo decoding
Starting from a complete software solution, if we aim to move blocks to hardware, the natural strategy is to move them from the above list, but in reverse order. The evolution of SoCs is itself an evidence of this strategy. In the first generation SoCs, only turbo decoding and FFT/IFFT/IDFT were provided as hardware accelerators, that is, the most computationally intensive but well-defined blocks. These blocks are shown in Figure 2 in dashed boxes.
Figure 2: Evolution of hardware acceleration for LTE data channels in eNodeB SoCs
The second generation of SoCs provided hardware acceleration for blocks c to f. These are marked by light colored, solid boxes in Figure 2.
The current (that is, 3rd) generation of SoCs completes the chain partially, if not fully, by providing hardware acceleration for blocks a and b. These are shown as dark colored, solid boxes in Figure 2. The 3rd generation SoCs leave very little to be performed for LTE data channels in the software. The number of instances in the earlier generation HW accelerator blocks increased in the 3rd generation, macro SoCs.
Note that the order in which blocks are accelerated in hardware is not the same as the logical ordering of blocks in the data chains. This has a bearing in the way forward-looking software should be designed (we consider this in the next section).
In reality, several vendors offer SoCs for 3G and 4G systems      , each having their unique features and advantages. Thus, this generic analysis for LTE would not fit any available SoC exactly. Nevertheless, it is relevant because there are not many ways of splitting the LTE physical layer data chains. Each SoC may leave out parts of the blocks described above, which have to be completed in software. A common component left out from hardware acceleration is channel estimation (not shown in Figure 2) – this is done to allow eNodeB developers to use proprietary algorithms that can differentiate their product.
The control of the hardware accelerators resides in the DSP cores that are also present on the same SoC. These cores are also used to complete the physical layer functionality (DL control channels, UL control channels and physical signals).
The number of instances of DSP cores, higher layer cores and hardware accelerators (for L1 and L2) is decided by the target application of the SoC. For LTE/LTE-A macro cells, many cores and a set of hardware accelerators are needed. This is taken care of in the 3rd generation SoCs. For small cells, a small number of cores and the minimal set of hardware accelerators will suffice. For LTE-Advanced and for macro cells with multiple sectors/carriers, many sets of cores and hardware accelerators will be needed. It is a challenge to design the software architecture that can easily be migrated to more powerful SoCs and scaled up according to the available processing power. A solution to this challenge is proposed in the following sections.
We note in passing that an evolution similar to the above is occurring in the same SoCs for WCDMA, and the SoCs support both 3G and 4G acceleration simultaneously. These aspects have not been considered in this paper. Similarly, other accelerators (for example, security) needed for the layer 2 and above are also not discussed.
IV. FORWARD LOOKING & FLEXIBLE ARCHITECTURE FOR LTE MACRO BASE STATION
In the industry, it is usual for physical layer developers to port their software onto several generations or variants of platforms and maintain more than one such stream simultaneously. To minimize rework, and considering the evolution of SoCs described in the previous section, it is clear that a systematic approach for software architecture and design can cater to all generations of SoCs. Principles of such a design are explained below. An architecture that adheres to the following principles is likely to be more “forward-compatible” than one that does not.
- Software design should cater to a family of SoCs, rather than a single SoC. For this to be possible, advance information from the SoC manufacturer is needed, which may entail a partnership with them. In the absence of such a partnership, the design team needs to periodically follow the releases and data sheets of upcoming or just released SoCs.
- Hardware accelerators of next-generation SoCs should be mimicked in unique DSP cores in the current generation SoCs, to the extent that the information about next generation SoCs is available. This way, the control software on the main, controlling DSP core that manages tasks and threads will change very little when migrating from one generation to another. Two added advantages exist. First, the number of times the controlling DSP core is interrupted will mirror that in the next generation SoC. Secondly, parameters for the processing will be copied as in the next generation SoC (the core that is mimicking the accelerator should be stateless), thus simulating its environment more accurately. Contrast this with a linear software implementation (function call or software-posted task) that will not bring out such interaction issues early.
- Inherent virtual addressing to isolate sector contexts (in a multi sector deployment) should be exploited. This will bring in some automatic scalability as a major effort- and cost-saver.
- Flexibility in the framework for deploying software modules in different cores is an obvious requirement, but we still state it for completeness.
Figure 3: Macro LTE cell architecture on a 2nd gen SoC
As an example of an architecture that follows the above principles, we illustrate our architecture for a release-9, LTE eNodeB physical layer for a single sector on a multi-core DSP  . This DSP has six cores and one hardware accelerator, the latter corresponding to that in the second generation SoC described in Section III. Looking ahead to the 3rd generation SoC, it was decided that core 0 and core 1 would retain a similar functionality in both generations’ SoCs. Core 3 and Core 4 would respectively mimic the accelerators on the 3rd generation SoC for blocks a and b in Section III. Refer to Figure 3 for an illustration of this architecture, which exemplifies principles 1 and 2 above.
To complete the picture, we add that the downlink signals and channels (Cell-Specific Reference Signals, Primary and Secondary Synchronization Sequences, the Physical Control Format Indicator Channel, the Physical HARQ Indicator Channel and the Physical Downlink Control Channel) are implemented in software in core 0. The uplink physical channels and signals (the Uplink Control Channel, the Physical Random Access Channel (PRACH) and the Sounding Reference Signal) are implemented in core 1. These software modules are optimized to achieve the standard-specific latency using well known techniques: compilation with the highest optimization level of executing speed, loop unrolling, the use of constant/restrict keywords, the use of cache-able memory efficiently, DMA use and so on.
V. FORWARD LOOKING & FLEXIBLE ARCHITECTURE FOR LTE SMALL CELL BASE STATION SoC
For the 3rd generation, small cell SoC that we are currently using   , the architecture is shown in Figure 4.
Figure 4: Small LTE cell architecture on a 3rd Gen SoC
Comparing this with Figure 3 reveals that two higher layer cores have been added, but four DSP cores have been removed and that the hardware accelerator has been enhanced. Given the hardware available, only one LTE release-9 sector is proposed (or two if the software and hardware usage are optimized a lot more). However, given the forward looking architecture of Section IV, we can see that neither the two unused cores of the second generation SoC nor the two cores (3 and 4) that mimic the hardware accelerators of this SoC are needed any longer. Further, the software for core 0 and core 1 will undergo little change, mostly to do with the drivers for the new-found hardware accelerators.
VI. FORWARD LOOKING & FLEXIBLE ARCHITECTURE FOR LTE–ADVANCED MACRO BASE STATION SOC
For an LTE-Advanced implementation on a 3rd generation SoC   , the architecture in Section V can be extended easily. First the software architecture can be extended to handle more than one component carrier (two is a sufficient number for high data-rate cells).
Figure 5: Macro LTE-A cell architecture on a 3rd Gen SoC
Second, to scale for multiple carriers/sectors or even component carriers, the same software can be deployed in additional cores (2 to 5). This will be made much simpler by the use of principle 3 of Section IV (see Figure 5). Of course, for such a scalability to be feasible, the hardware accelerator throughputs must match, which is usually achieved in the SoCs by increasing the number of instances (which is also depicted in the same figure). In the specific case of the SoC in  , the extra core performance  can be used to deploy more sectors and carriers than predicted by a simple linear extension of the performance in  . However, this will require more software changes than given in the last row of Table 1.
VII. REAL TIME IMPLEMENTATION RESULTS
Using the forward-looking architecture with parallel programming and multi-core methodology as discussed above, the performance we achieved is given in Table 1. We note that at these performance numbers, or scaled equivalents depending on the system configuration, the solution also interworks with a third-party UE simulator. The results show that in the downlink, the theoretical maximum rate for two antennas has been reached while it also works with 4 to 8 UEs/TTI.
Table 1: TCS LTE Base station Performance Figures
|Key Performance Indicator (KPI) ||Macro Cell Base station ||Small Cell Base Station2 ||Macro Cell with Multi Sector3 |
|3GPP LTE Release ||3GPP LTE Release 9 ||3GPP LTE Release 9 Small Cell Solution ||3GPP LTE Release 10 LTE Advanced |
|DSP HW /SoC Used ||MSC8157 ||BSC9132 ||B4860 |
|No.of UEs/TTI supported ||4UEs/TTI ||8UEs/TTI ||20 UEs/TTI |
|No. of Connected Users1 ||128 ||32 ||512 |
|Data rate achieved (DL/UL) in Mbps ||150/50 ||150/50 ||600/150 |
|MIMO Configuration ||2X2 ||2X2 ||4X4 |
|Software portability (% lines of code changed from the current implementation) ||Current ||15% ||25% |
- Applicable only if PHY stores semi-static information
- Projected figures, implementation is in progress
- Projected figures for upcoming implementation
LTE/LTE-A eNodeB design presents a challenge of handling various kinds of base station requirements which require software architecture changes and corresponding code changes. Also, another operational challenge is the long lead time in getting the latest SoC HW for porting an existing solution. This impacts the time to market the solution and/or meeting the customer milestones with new requirements. A forward looking architecture will address these challenges, at least partly, on the existing HW itself. We have described our LTE eNode-B physical layer implementation, which has such a forward looking architecture and reduces the time for porting it on various SoCs targeting base stations for small cells, macro cells and LTE-A cells. As pointed out, achieving such porting with minimal effort in software modification will need only the latest and upcoming SoC documentation from the vendor as against waiting for the HW itself.
From the projected real time implementation results for LTE-Advanced, we can realize the maximum performance (600Mpbs in the DL and 150Mbps in UL) using Freescale’s Macro Base station SoC B4860. This SoC can also be used to implement multi-sector and multi-carrier LTE-Advanced eNodeBs. The data rates achieved can be stretched even further by efficiently utilizing multi-core synchronization using the same forward looking architecture.
The real time implementation results achieved with our forward looking architecture for a LTE Release 9, small cell solution and the proposed LTE advanced performance metrics realizes both the stringent protocol demands and business goals of cost-effective upgrades.
We thank our team at Tata Consultancy Services for their help in this work, and convey our deep-felt gratitude to Mr Rajarama Nayak, Head Embedded Technology Solutions Group, TCS, for giving his valuable suggestions and comments on this paper. We also thank Mr.Srinath Chitlapalli Head, Telecom EIS and K.C Ganesan, Program Manager for their continuous support and encouragement on this paper.
 3GPP TS 36.211 V10.5.0 (2012-06)3rd Generation Partnership Project;Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (E-UTRA);Physical Channels and Modulation (Release 10)
 3GPP TS 36.212 V10.6.0 (2012-06) 3rd Generation Partnership Project; Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (E-UTRA); Multiplexing and channel coding (Release 10)