Optimize performance and power consumption with DSP hardware, software

Optimize performance and power consumption with DSP hardware, software
By By Leon Adams, DSP Strategist and Raj Agrawala, C5000 Product Manager, Texas Instruments, Courtesy of Power Management DesignLine
Jun 6 2005 (13:04 PM)
URL: http://www.embedded.com/showArticle.jhtml?articleID=164300796

System designers who are developing mobile devices based on digital signal processors (DSPs) are faced with an unrelenting challenge to save every possible milliwatt of power in order to achieve longer operating time from each battery charge. Some larger systems, too, pose similar power-conserving challenges, though for a different reason. Access and other equipments in sealed environments are extremely sensitive to the heat generated by power-hungry circuitry. Multi-channel equipments, such as Digital Subscriber Line access multiplexers (DSLAMs) and central office multiplexers, need to control heat in order to pack more channels into the same space and reduce per-channel operating costs. In all these cases, reducing power consumption is an overriding design goal.

The question facing designers as they develop these systems is how to balance low power with high performance. The very reason for using a DSP is to gain maximum computational performance for real-time applications such as voice and audio-video communications. Since these applications are on the increase in mobile communications, along with new functionality such as speech recognition, DSPs with higher levels of performance are continually required. But, lots of processing normally means lots of power consumed, not only by the high-frequency DSP core, but also by memory fetches on- and off-chip and by peripherals handling data I/O and control. Designing for both high performance and low power consumption, then, would seem to be an impossibility " like trying to have your cake and eat it, too.

The way out of this predicament is in understanding and carefully managing the DSP hardware and software. Some DSPs are designed for low-power operation, based on manufacturing processes that are tailored for low power consumption. The architectures of these DSPs leverage the fundamental process technology with hardware features that use the chip more power-efficiently, while still supplying the high performance necessary for advanced applications. To help the developer take advantage of these features, the DSP real-time operating system (RTOS) provides power management capabilities that can give the application fine-grained software control over hardware power consumption. With these and other hardware and software features available, power management becomes an important part of the DSP development process, especially for mobile and heat-sensitive systems.

Active versus standby power

Process advances continue to be responsible for much of the reduction in DSP power consumption, since smaller-geometry transistors in the core's logic require decreased operating voltages. The effects of lower voltages can be dramatic: decreasing DSP core voltages from 1.6 volts in available products to 1 volt in future releases may reduce active power consumption by as much as 80 percent. Additionally, processes are being tuned for low power consumption through the use of low-leakage transistors with a high transition threshold voltage (V_T). These transistors minimize quiescent current (I_Q) during periods when the circuitry is powered on but inactive, such as during system standby modes. Recent DSP product releases feature standby power consumption of only 0.12 mW, less than one percent of the standby power required by earlier devices.

Determining the balance between active and standby power consumption in an application is an important design issue, one that can affect the type of DSP used. In large, heat-sensitive systems, active power consumption is the governing factor; but in battery-operated systems, standby power can be just as important. Cell phones, for instance, can drain a battery just as readily during waiting times as during talk times. MP3 players tend to be either on or off, so standby power consumption is less important, while some types of handheld instruments may be on and draining power even when they are not in active use. System designers do not have any direct control over whether the process technology used in a DSP lowers quiescent as well as active current. Nevertheless, they should weigh the balance of active and standby power needed by the application, then consider process capabilities among other factors in selecting a DSP.

Low-power hardware

Just as process advances benefit system designers but cannot be directly manipulated, certain architectural benefits are conferred automatically, such as power control of on-chip memory. Today, integrated cache and ROM may total hundreds of kilobytes — far too much memory to be active all the time without considerable power drain. Careful study of algorithms reveals that most fetches hit within a small cache area a vast majority of the time. Power-saving DSP architectures take advantage of this tendency by dividing memory into fine-grained blocks, then enabling only the block that is in use. In practice, a distributed addressing scheme gates a narrowing funnel of row, column and enable lines, as well as the clock tree structure, so that only a single block of, say, 2K words becomes active for each access.

Other inherent architectural features save power as well. Related to cache control is the careful scheduling of direct memory access (DMA) through power-conscious design of the on-chip DMA controller, offering a particularly effective way of reducing peripheral power consumption. Within the core itself, a dual multiply-accumulate (MAC) data path doubles performance on repetitive arithmetic tasks, allowing the core to return to a low-power state more quickly. In most tasks, array values in both MACs can be multiplied by the same coefficient. So with specific coding the coefficient can be fetched only once, then loaded into both MAC units, enabling operations to take place with only three fetches per array multiplication cycle instead of four.

Finally, a scalable 32-bit instruction word allows multiple instructions to be fetched in a single cycle, saving accesses. A decoder inside the core extracts instructions from the word, checking for short inner loops that it can store in an instruction buffer queue to eliminate repeated fetches for loop iterations. All of these features serve to optimize performance and power automatically, but there are other hardware elements that can deliver software control over power consumption to the designer.

Power-controllable hardware

A simple way to reduce active power consumption is to stop clocking a circuit, thus idling it when it is not needed. This technique is commonly used to suspend processor cores while waiting for an external event, but it is also used in some DSP architectures to idle different sets of functions grouped as clock domains. For instance, the core, cache, DMA controller, clock generator, peripherals and external memory interface may each be a separate domain. With its clock suspended, the domain draws only quiescent current but is ready to be reactivated with negligible latency.

A more radical technique is shutting off supply power to a function when it is not in use, eliminating IQ leakage altogether. This technique, long used at the system level, is now becoming employed within devices to selectively turn off the cache, memory interface, DMA controller, serial ports, I²C bus, timers and other I/O peripherals. Since these functions are used intermittently, turning them off when they are inactive can bring significant savings in power, though there is some latency involved in turning them back on again.

Gating power rails, like gating clock domains, requires that on-chip functions be controlled selectively. Support for multiple standby modes provides a way of clocking and delivering power to functions that simplifies software control. Software control during real-time operation will be discussed later, but one design technique that is often overlooked is system start-up. Typically, every function is brought up in the boot sequence, then unused ones may be selectively powered off. Using standby modes to keep unused functions powered off during the boot sequence can help prolong the battery charge, especially if the system is frequently turned on and off.

A highly significant technique, in terms of power conservation, lies in reducing the DSP core clock rate. If the core is not operating at 100 percent of its rated performance, it can still meet its processing requirements at a lower frequency while drawing less power. Most programs have identifiable "hot spot" functions that actually finish in less time than is allocated to them. Scaling back the frequency lets the function use the full amount of time allocated, but with less power consumed. The typical power savings ratio equals the ratio of frequency division; so, for example, halving the frequency cuts the power consumption in half. In addition, if the reduced frequency is compatible with a lower operating voltage available to the core, then the power required can be cut even more. DSPs that provide application software control over frequency and voltage give the designer an effective means for real-time control of power consumption.

Real-time power management

If only one application ran on a DSP system, power management could be handled through up-front design decisions that allowed little flexibility for program changes during operation. However, programmable DSP systems are increasingly being assembled with software components from multiple sources. Clock idling, power gating, and dynamic frequency and voltage scaling can have significant impact upon these components and the operating system itself. For example, idling clocks for independent application threads could easily lead to deadlocks and missed deadlines. Scaling frequencies could disrupt the execution of periodic functions, RTOS time services, API (application program interface) timeouts, and the application's ability to meet its real-time deadlines. In addition, device drivers often need to be notified of frequency changes and power modes so that they can reset registers and command external devices to low-power states.

In order to deal with these dynamic requirements, the RTOS must implement power awareness and control. The responsible component of the RTOS is the Power Manager (PWRM), which manages all power-related functions in the RTOS application, both statically configured by the application developer and dynamically called at run time by the application. Interfaces to the PWRM enable the application developer to selectively idle clock domains, specify a power-saving function to be called at boot time to turn off unnecessary resources, dynamically change voltage and frequency at run time, activate chip-specific and custom sleep modes and provide central registration and notification of power events to system functions and applications.

Figure 1 shows a representative implementation of power management software. Here the PWRM does not exist as another task in the system, but as a set of APIs that execute in the context of application control threads and device drivers. The PWRM interfaces directly to the DSP hardware by writing to and reading from a clock idle configuration register, and also through a platform-specific power-scaling library (PSL) that controls the core clock rate and voltage-regulation circuitry. The PSL also supports callbacks to application code before and after scaling, as well as queries to determine the present voltage and frequency, supported frequencies and scaling latencies. In this way, the PSL both communicates necessary information and isolates the PWRM and applications from low-level control of the frequency- and voltage-control hardware.

Figure 1. Power Manager Partitioning

A representative sequence of events can illustrate how the PWRM functions. In this example, clients register and are notified about frequency-voltage events. The steps correspond to the numbers in Figure 2, with 1-3 as the registration sequence and 4-7 the scaling sequence. 1. The application code registers to be notified of changes in frequency-voltage setpoints. 2. A driver uses DMA to transfer data to and from external memory registers to be notified. 3. Packaged binary code registers to be notified. 4. The application decides to change the setpoint and calls the PWRM API to initiate the setpoint change. 5. The PWRM checks if the requested new setpoint is allowed for all registered clients, based on parameters they passed at registration, then notifies them of the impending setpoint change. 6. The PWRM calls the PSL to change the setpoint. The PSL writes to the clock generation and voltage regulation hardware as appropriate to safely change the setpoint. 7. Following the change, the PWRM notifies clients.

Figure 2. Power Event Notification

Power savings benchmarks

The effectiveness of these techniques has been tested in a working audio application through step by step modification of power saving features and then measuring the power consumed as different features were used. To provide our readers with first hand experience on how to use these techniques and evaluate the power consumption number themselves, we have reproduced the power numbers calculated by a power planning spreadsheet for different configurations. The spreadsheet had been developed by using actual power consumption numbers measured on actual boards and is a tool for estimating the power consumption for simple applications. The spreadsheet and application note used for calculating the power numbers in Table 1 can be found at: www.ti.com/powerplanningspreadsheet.

Table 1 summarizes the power measurements at these different configurations showing only the CPU operating while all the peripherals are turned off.

Table 1. Power Measurements Examples using the Power Planning Spreadsheet and Power Application Note

For any given application the PWRM can be disabled and CPU can operate at a given frequency. In this particular scenario while the CPU is operating and all the other peripherals are disabled, the application consumed 216 mW. The biggest change observed came from scaling the frequency from 200 to 24 MHz, which cut power by 87 percent. Idling the cache and CPU and dropping the voltage to 1.2 V reduced power consumption by another 11 percent, making a total reduction of 98 percent from the application running without power management.

Putting power techniques to work

Clock idling, power rail gating, and dynamic voltage and frequency scaling can all help DSP system designers reduce power consumption in their systems. To use these techniques effectively, designers need greater information about power consumption during operation. Traditionally, DSP power information and control have focused on typical core and memory consumption, ignoring important factors such as operating modes, peripherals and I/O load. Today, however, new tools are providing greater visibility into power consumption by different functions on the chip during real-time operation. This visibility, together with more complete published information by DSP vendors, as well as the power-saving techniques discussed, is making it possible for system developers to design for power just as carefully as they design for performance.

In a number of DSPs designed for low-power operation today, software and hardware power control techniques are augmented by the inherent advantages of new process technology and key architectural features. Taken altogether, these innovations make it possible to continue the advances made in introducing ever-greater functionality into mobile devices, as well as packing more channels into high-density, heat-sensitive equipments. More than ever before, system designers can have both DSP performance and low power consumption.

Author Info

Raj Agrawala is responsible for developing the strategic roadmap and product definition for all products within the C5000Ã¯ platform. Prior to joining TI's C5000 DSP division in 2003, Agrawala was the marketing manager of Agere Systems. Prior to joining Agere, Agrawala was a strategic marketing engineer with the Motorola Semiconductor division. At Motorola's Embedded Microprocessor Division. Agrawala received his bachelor's of science in mechanical engineering from the University of Allahabad in 1980, his master's of science in nuclear engineering from Ohio State University in 1983 and his master's of science in electrical and computer engineering from the University of Texas in 1993.

Leon Adams is responsible for overseeing TI DSP Strategic Initiatives, DSP Strategic Investments and acquisitions, DSP product positioning, DSP IP Licensing, and DSP market and competitive assessment. Adams joined TI in 1980. Previous duties included Microprocessor Systems Engineer, IBM Token Ring NIC Program Manager, LAN Marketing Manager, Open Systems Marketing Manager, Computer Segment Marketing Manager, and Microprocessor Business Unit Manager, and DSP New Business Development Manager. Adams graduated summa cum laude with a BS in Engineering Physics from Murray State University. He received his MBA with honors from the University of Texas at Austin.

Industry Articles

Optimize performance and power consumption with DSP hardware, software