By Dominic Pajak, Processors Division, ARM
Reprinted from IQ (Information Quarterly) Magazine, Volume 5, Number 2, 2006
The progression of cellular modem technology is a great example of the increasing complexity of system-on-chip design, and the market pressure for high quality products to be delivered to aggressive deadlines. This presents challenges across many areas: meeting performance demands while minimizing silicon costs; controlling power; developing and debugging increasingly sophisticated software; multicore debug; and enhancing security to support flexible content.
This article considers the ARM solutions that address these issues. To give a real-world focus, we present the TTPCom’s CBEmacro 3G modem as an example. The CBEmacro 3G design includes an ARM1156T2-S™ core, ARM PrimeCell® peripherals and AMBA® AXI bus. The system offers potential for multicore debug using CoreSight™ technology, and enhanced security through TrustZone®. Low-power implementation is possible using ARM Artisan® Metro™ Physical IP. Finally, RealView® tools and models were used in software development and system benchmarking. Each of the ARM products discussed have been designed, integrated and tested together.
The progression of cellular modem technology is a great example of the increasing complexity of system-on-chip designs, and the market pressure for high quality products to be delivered to aggressive deadlines. In the following pages, we will examine how ARM® technologies solve the technical challenges faced in the design of today’s wireless SoC platforms, using TTPCom’s CBEmacro 3G modem as a case study.
CBEmacro 3G Modem
The CBEmacro 3G is a complete modem solution addressing the GSM, GPRS, EDGE, WCDMA and HSDPA standards. It enables the rapid development of baseband platforms by drawing on TTPCom’s expertise and investment in cellular modem technology.
In the design of the CBEmacro, TTPCom chose the ARM1156T2-S™ processor as the heart of the baseband subsystem, utilizing ARM PrimeCell® peripherals, AMBA® 3 AXI™ bus and RealView® tools in its creation. In this article we consider the benefits CBEmacro gains from this ARM processor- based solution, and how the ARM ecosystem enables the rapid integration of CBEmacro into a larger SoC design.
Meeting Performance Demands
An advanced micro-architecture and branch prediction unit combined with the Thumb®-2 instruction set architecture allows the ARM1156T2-S processor to bring the best instruction throughput of any ARM11™ family processor.
In benchmarks, Thumb-2 on an ARM1156T2-S processor showed up to a 2x performance increase over Thumb code running on an ARM9™ family processor. In a real-world system, users can expect to see a 1.5x performance increase. This is confirmed by benchmarks done on TTPCom’s 3G protocol stack code. When compiled for Thumb-2 on an ARM1156T2-S processor, samples of the protocol software showed a 1.5x performance improvement compared to the same code compiled for Thumb benchmarked on an ARM946E™ processor. This performance increase enables the same operations to be performed at a reduced clock frequency. Combined with a low-power implementation using ARM’s Artisan® Metro™ cell libraries, power consumption for 3G wireless applications can be lowered by 47%.
Based upon foundry optimized libraries for ARM946E and Artisan Metro libraries for ARM1156T2
Minimizing Die Size
As a consequence of high performance, the ARM1156T2-S processor can be synthesized at lower frequencies and utilize highdensity cache memories. This gives the ARM1156T2-S processor an estimated core area of only 2.4 mm2 with 16k instruction and data caches on a 90 nm process. Surprisingly the ARM1152T2-S processor is therefore a smaller and lower power choice than ARM926EJ-S™ processor, which must be synthesized above its target frequency and with larger high-speed RAMs in order to meet the performance demands of HSDPA baseband. Further area savings are made by sharing debug blocks using ARM’s CoreSight™ technology, described later in this paper.
In addition to the small core area of the ARM1156T2-S processor, area costs can be further reduced as a direct result of the Thumb-2 instruction set architecture. Thumb-2 has been architected to satisfy code density and performance requirements in a single instruction set architecture. It provides support for full register and coprocessor access, as well as SIMD and NEON™ operations where implemented, exception handling and system control is all possible in Thumb-2. Performance critical and system routines that were traditionally implemented as ARM code can now be implemented in Thumb-2 while maintaining high code density.
TTPCom also noted that Thumb-2 code was only 3% larger than Thumb when compiled with RealView Compilation Tools (RVCT) 2.2.1 while achieving a substantial performance increase. This is consistent with the code size benchmarks undertaken by ARM on a representative set of 9 MB cross section of real-world code. Since that time the latest compiler (RVCT 3.0 beta) has achieved a Thumb-2 code size benchmark that is actually 0.5% smaller than Thumb.
Integrating an applications processor
The CBEmacro 3G modem is easily integrated with applications processors such as ARM1176JZF-S™ or the Cortex™-A8 via an AMBA3 AXI compliant expansion bus. The additional applications processor would share the bus with the ARM1156T2- S baseband MCU using a pre-configured ARM PrimeCell PL300 AXI interconnect already present in the system.
The high-performance AMBA AXI bus transaction model allows data and address phases to be decoupled, maximizing system bandwidth and reducing the operating frequency. Thumb-2 code running on the ARM1156T2-S processor can also play a part in maximizing bus bandwidth available to high-performance video or applications processors by reducing instruction fetches from main memory for performance critical code, also allowing greater proportion of critical routines to reside in tightly coupled memory, and reduces cache misses. This in itself lowers power requirements and increases battery life.
Network operators are driving to maximize revenue by providing mobile services such as on-demand media content, software downloads and m-commerce. To do this, security technology is required to protect the copyright of downloaded software and media (Digital Rights Management), prevent software viruses, and create a trusted platform for sensitive user data. CBEmacro 3G therefore incorporates TrustZone®, ARM’s trusted computing technology. TrustZone provides integral hardware and software support for security critical applications, maximizing the CBEmacro 3G-based handset’s potential Average Revenue Per User (ARPU) when integrated with a TrustZone enabled applications processor and TrustZone software. TrustZone enables the applications processor to communicate with the baseband system via a secure kernel. However, untrusted user code executed on the application processor (e.g. a downloaded game) can be restricted from accessing secure memory. In this manner both the baseband MCU operation and the trusted OS on the applications core are protected from unauthorized access by user code. Furthermore, should the system designer wish, it is also possible to deploy secure applications to the device in the field (e.g. allowing operators to deploy DRM agents to users that purchase the service). Flexibility of content is therefore maintained without compromising the integrity of the system.
TrustZone enabled AXI bus architecture of CBEmacro with baseband, applications core and a TrustZone Memory Adapter (BP141)
Rapid system development
The development of SoCs around the CBEmacro 3G modem can be accelerated using ESL tools such as SoC Designer from the RealView CREATE family. SoC Designer allows the rapid design, simulation and debugging of complex SoCs from a broad and extendible library of SystemC models. The model library spans from ARM cores and AMBA bus fabric to a variety of third party processor cores and DSPs. Functional and cycle accurate models coupled with a system profiling capability mean that an optimum design can be reached more quickly and accurately than previously possible.
A system software time-to-market edge can also be gained by using virtual prototypes created using SoC Designer, for device driver and application coding and testing well before RTL or silicon become available. SoC Designer features fast SystemC simulation with a fully featured GUI environment for development and debugging.
Controlling System Power
Reducing power consumption
Considerable power savings are afforded as the ARM1156T2-S processor is able to operate below the frequencies required by an ARM9 family processor for 3G baseband, as discussed.
The low-power design of the ARM1156T2-S processor and CBEmacro can be further enhanced by implementation using ARM Artisan Metro™ cell libraries. The Metro platform is the first comprehensive physical IP solution designed specifically for performance- oriented, low-power portable electronic devices, and is therefore well suited to the CBEmacro.
The Metro cell library is based on a series of new architectures that dramatically reduce power while enhancing density - taking advantage of new process, circuit design, voltage scaling, power aware EDA and chip-level design techniques. The Metro platform provides every designer with access to IP that incorporates the "tricks-of-the-trade" of low-power IC design and delivers results previously only achievable by experienced full-custom designers.
Using Metro libraries on a 90 nm process, the ARM1156T2-S processor is approximately 1.12 mm2 at 315MHz, consuming only 75% of the power of a standard implementation. This is estimated to be just 0.19 mW/MHz on CLN90G.
Minimizing Software Development Time-to-market
The progress of wireless technology and increasing data bandwidths is coupled with rising expectations for rich applications and features on wireless devices. The development and validation of increasingly complex applications on multi-core SoCs can become a bottleneck to product delivery. On-chip debug and trace capability can therefore have dramatic impact on software reliability, performance and time to market of a product.
CoreSight technology offers a complete solution for multi-core debug and trace. The CoreSight Design Kits provide a powerful debug infrastructure, with single point access to trace and debug of multiple CoreSight compliant cores on a SoC with minimized silicon area and pin cost. In the case of the CBEmacro 3G, the ARM1156T2-S baseband processor features an ETM11CS option for real-time instruction and data trace, and a bus monitor for traffic between the ARM1156T2-S processor and DSP subsystem. When combined with an applications processor such as ARM1176JZ-S or Cortex-A8, CoreSight allows both ARM processors to be traced and debugged using one set of tools and sharing the off-chip debug and trace connection. CoreSight technology is also designed to be extendible to support compliant 3rd party peripheral or DSP trace in the future.
CoreSight trace technology enables realtime instruction and data trace to be obtained from multiple devices while keeping pin-count and area overhead to a minimum, with multiple trace sources taken off-chip via one Trace Port Interface Unit (TPIU). The CoreSight Debug Access Port (DAP) provides independent JTAG access to multiple cores through a single 5-pin JTAG interface. One core can be powered down, or in sleep mode, while the debug link to other cores is maintained. This can lead to dramatic reduction in pin count, for example combining two 21-pin trace ports and two 5-pin JTAG ports saves 26 pins. CoreSight Serial Wire Debug (SWD) offers the possibility to further reduce the number of debug pins from the 5 required by JTAG, to only 2.
An example CoreSight system for an ARM1156T2-S and ARM1176JZ-S SoC
Software engineers at TTPCom benefited from the reuse of investment in the familiar and widely supported RealView Development Suite (RVDS), part of the RealView DEVELOP family of embedded software tools. The compilation flow for the ARM1156T2-S processor was further simplified by targeting Thumb-2, where time-consuming profiling, compilation and interworking of individual ARM and Thumb routines may have once been necessary. Existing Thumb code, typically applications, will run on Thumb-2 cores without modification. The ARM1156T2-S processor features advanced branch prediction which improves Thumb code execution.
To take advantage of Thumb-2, C source can be recompiled using RealView Compilation Tools (RVCT 2.2.1 available now as part of RVDS), or with third party tools such as the GNU compiler. In porting existing GSM protocol layer 1 code from ARM9E family of processor to Thumb-2 on ARM1156-T2, TTPCom identified only one Thumb-2 related issue which was quickly resolved.
Earlier software development
The same RealView SoC designer technology that allows rapid system hardware prototyping also permits software development and debug on a simulation platform. The use of cycle accurate core and peripheral models facilitate the development of detailed device drivers before hardware becomes available. This gives a time to market advantage by allowing software development projects to start long before silicon.
The development of larger applications on the ARM1176JZ-S processor is possible using Real Time System Model (RTSM) simulation technology. This offers an ultra-fast instruction accurate simulation and full connectivity to RealView Debugger. The RTSM simulation includes full emulation of peripherals such as LCD display, keyboard and mouse. On a typical desktop PC it is possible to simulate a Linux boot on the ARM1176JZ-S processor in under 5 seconds.
This paper has outlined the ways in which ARM technologies contribute to solving the problems faced in SoC design, and how by adopting them TTPCom is able to offer a more efficient and lower power product to its customers. Each of the ARM products discussed have been designed, integrated and tested together by ARM as an optimal solution to the problems faced by complex SoC design.
The depth of technical expertise across multiple areas allows ARM to orchestrate technical synergies that may not arise when components are sourced from separate vendors. Partners benefit from a single point of sale and support, with reduced risk in design integration. The critical mass of 3rd party tool and OS support for ARM technology through the connected community gives partners and OEMs a compelling case for choosing ARM processor-based solutions.