Madhusudhan Prabhu, Sreenath Panganamala, Sriram Balasubramanian
Synopsys India Pvt. Ltd., Bangalore
This paper discusses about the intelligent low power techniques such as context based clock gating and how they are useful for IoT applications. It also describes how it improves the overall power efficiency of the system. The power statistics shared shows how the overall idle power and functional power consumption is significantly reduced. Further we discuss about how it can be combined with few other low power techniques to reduce the overall power consumption.
DMA Controller, Internet of Things, Clock Gating, Low Power
I. MOTIVATION FOR LOW POWER TECHNIQUES IN IoT SYSTEMS
An Internet of Things (IoT) is an ecosystem of devices connected to the Internet, including smart phones, desktop systems, smart sensors, smart automobile, and the "things" embedded with electronics, software, sensors, and actuators with connectivity to internet.
Now, with advancement in the fields like software, semiconductor, communication, miniaturization of hardware has pushed the incorporation of intelligence and connectivity into the smallest things like sensors, cloths to largest systems like automobile, factory, or an entire city itself. This level of integration of intelligence and connectivity mainly requires highly reliable, highly secure, high performance, low cost, and low power system board and integrated chips.
The battery backed up devices are gaining more popularity in the last few years, which demands smaller form factor, less weight, increased battery life and high performance.
All these high-performance, reliability, and portability factors have spurred a renewed interest in low power designs. Architects and designers have also quickly learned that there is no single solution to the problem, the power optimization needs to be considered at all phase of the design life cycle, including architecture, algorithms, technology, RTL design, backend activities.
II. MAJOR LOW POWER TECHNIQUES
It is important to consider the Power in all stages of the System, Software, and Hardware design. Irrespective of whether it is a system board design or chip design, system developer can use low power techniques in each phase of the design to reduce the power consumption.
In the context of Integrated circuit design flow, there are several stages where in which low power design techniques can be used like Requirement Specification, Architecture design, RTL development, Synthesis activities, and Physical design activities.
At the architectural level, designer will make architectural decision to choose the clock frequency, micro-architecture of the design, how the design needs to be partitioned w.r.to clock etc. In this phase, the designer will have lot of flexibility to reduce the overall power consumption.
The following are few major architectural low power techniques used.
- Clock Gating
- Architectural Clock Gating
- Dynamic Frequency Variation
However, these low power techniques come at a cost of Speed, Area, Performance penalty. Based on the application or system requirement these techniques need to be carefully adopted.
In a system, the significant portion of the dynamic power is consumed by the clock distribution network. Since, the clock buffers are having highest toggle rate in the system, these clock buffers can consume 50% or even more of the dynamic power . Further they will have high drive strength to reduce the clock delay. In addition, the flipflopsd receiving the clock can dissipate dynamic power even though they are not switching states. Intuitively, we can turn off the clocks for the transistors or flipflops when they are not functional. This will help in reducing significant portion of dynamic power consumption, while preserving the state of the transistor or flipflops.
Consider a typical multi-bit flipflop logic with enable in Figure 1 shown below . The update of flipflop is done based on the flipflop enable (EN), otherwise old value is retained.
Figure 1: Low Power Technique - Clock Gating
The same circuit can also be implemented with clock gating logic as shown above. The clock gating reduces the significant amount of dynamic power consumed by the circuit. Along with that clock gating saves additional power and area by reducing the need for the multiplexer logic at the inputs of the flipflops. This technique makes use of negative edge latch and AND gate to implement the clock gating.
Today, most of the latest technology libraries include clock gating cells and implementation tools can use these cells to perform the operation of auto-insertion of clock gating cells to reduce the considerable amount of dynamic power consumption.
Architectural Clock Gating
The Architectural Clock Gating is a context aware technique to determine functional idle period to optimize the power. While designing the architecture, the designer can identify the blocks for which clocks can be turned off under certain condition. These conditions need to be auto-detected by the RTL or programmed through the software and then, clocks can be turned off for that block thereby reducing significant amount of dynamic power.
This paper discusses about Architectural Clock Gating implemented in the Synopsys DesignWare DMA Controller IP.
Dynamic Frequency Variation
Dynamic Frequency variation is a technique in which the frequency of a particular block is dynamically increased to perform some operation and once operation is finished the clock frequency is decreased to original low frequency. For example, I2C Controller application interface normally operates at lower frequency, but during the data transfer the application interface clock frequency can be dynamically increased to finish the data transfer quickly. Thus, the average dynamic power consumption is reduced.
In addition to the techniques discussed, system can be made low power by considering – Low Power RTL design, gate level optimization, frequency islands, power gating, multi supply voltage, multiple threshold voltage, dynamic voltage scaling.
III. A CASE STUDY – SYNOPSYS DESIGNWARE DMA CONTROLLER
In order to develop highly secure, high performance, and low cost IoT designs, the system architects are pushing several features and functionalities into smaller boards and SoCs. Such a typical SoC used for IoT will have following major sub blocks:
- Processor Core,
- WiFi/Bluetooth Core
- Sensor Subsystem with I2C, I3C, and SPI
- USB Core
- DMA Controller
Since, DMA Controller is a very important block in any IoT subsystem, this paper considers the Synopsys DesignWare DMA Controller IP for a case study.
DesignWare DMA Controller IP is generic DMA controller, which is used to offload the Processor from data movement operation between the memory and peripherals, which otherwise will consume significant amount of Processor cycles (MIPS). DMA controller needs to operate at same frequency as that of processor to meet bandwidth requirement. This increases the dynamic power consumption of the DMA Controller. Thus, it becomes very important to reduce the power consumption at the Architectural level itself.
Now, we will consider the Architecture of the DesignWare DMA Controller IP to understand - how and where the power consumption can be reduced. The DesignWare DMA Controller has following major features:
- Configurable DMA Channels and each channel can be used for concurrent DMA operation.
- Configurable AHB Master Interfaces and with option to support different AHB layers.
- Configurable Handshake Interfaces to interact with the peripherals.
- AHB Slave Interface for software configuration.
The study of the use case of DMA Controller reveals following facts:
- All Channels need not be active always, there is idle period before or after a DMA operation.
- DMA AHB master has compete with the other AHB masters on the AHB fabric to gain grant to the bus; this will result in Idle period where AHB master is waiting to gain the grant.
- DMA Controller also needs to transfer the data to or from slower peripheral - with help of handshake signals. This provides us one more opportunity to save power.
These facts are true in the case of an IoT system as well. As mentioned in the earlier section an IoT system will have several sensors like temperature, motion sensor, smoke detection, fire sensor etc. connected to Processor through the I2C Controllers, I3C Controllers, SPI Controllers.
Consider an example in which a temperature, motion sensor, smoke detection, fire sensor is connected to the I2C controller, the update requirement is once in 20 ms. In such cases, a configured DMA Channel of the DMA Controller for the I2C Controller will be waiting for an activity on the handshake signals to initiate the DMA transfer. Corresponding DMA Channel will be in idle mode until handshake signals are asserted - the DMA Controller can use this opportunity to save the power by turning off the clock to that DMA Channel. This helps in reducing the dynamic power consumption in the IoT system. There are several such power saving intelligence which can be built in, to save the power.
Based on the above use case analysis, DesignWare DMA Controller implements Intelligent Context based Clock gating mechanisms. As a fallout of this following two Low Power Clock Gating features are implemented in the DesignWare DMA Controller.
Channel Specific Clock Gating
A DMA channel auto-detects any Idle period based on the above use case study. When DMA senses the Channel is idle, clock gating is performed and corresponding channel enters Low Power mode. Then DMA Channel monitors any activity on handshake signals, AHB master interface, AHB slave interface; on detection of any activity, corresponding DMA channel exits the low power mode by providing the clock. There is no performance overhead due to this Architectural Clock gating.
Global Clock Gating
When all DMA Channels are idle - the entire DMA Controller enters low power mode by turning off clock to the entire DMA Controller. Only a small portion of the logic is supplied with the clock to handle the wake-up logic. The DMA Controller will exit the Low power mode if DMA Controller detects any activity on any one of the DMA Channels.
IV. POWER STATISTICS
The efficiency of any Low Power technique can be measured in terms of the following measures
- Idle Power Saving
- Functional/Operational Power Saving
Idle power is the measure of the power consumed by the system when it is not in functional mode i.e. when clocks are provided and reset cycle is performed, but system is not enabled for its normal functionality.
Functional/Operational Power is measure of the power consumed by the system when it is in the functional mode or operational mode.
Idle power saving has an advantage over Functional power as it is not dependent on the scenario. The power saving provided by the Functional Power is dependent on the scenario used for power computation. But still the computation of the Functional power is important as it gives a perception of how much power is consumed during typical operation of the system.
DesignWare DMA Controller Power Statistics
The following setup is used for the computation of the Idle and Functional Power
- DesignWare DMA Controller with 8 Channels, 1 Master Bus, 16 Handshake Signals - A typical configuration of IP.
- Clock Frequency of 100 MHz
- PT-PX SAIF Based Flow is used.
- TSMC 40nm Library is used
The following Table 1 shows the Idle Power saving for the DesignWare DMA Controller IP.
Table 1: Power Statistics - Idle Power
|S.No. ||Configuration ||Normalized Idle Power (mW) ||Idle Power (%↑) |
|1 ||Typical ||1.0000 || |
|2 ||Typical with DC CG ||0.3948 ||-60.52 |
|3 ||Typical with global and channel CG ||0.0437 ||-95.63 |
|4 ||Typical with global, channel and DC CG ||0.0402 ||-95.98 |
* CG = Clock Gating
In the above Table 1, Row 1 shows the normalized Idle power consumption of the DesignWare DMA Controller IP without the clock gating logic, which is used as the reference for benchmarking. DesignWare DMA Controller with Global and Channel Specific (Architectural) clock gating saves the power up to 95.63 %. The normalized Idle Power consumed by DMA Controller is only 0.0437 mW. This Architectural Clock gating can be combined with Synopsys Design Compiler (DC) inserted clock gating to further increase the Idle power saving to 95.98%
The above mentioned Idle Power saving of 95.63 % is done at the cost of only 2.38 % increase in gate count and with no performance loss. Further this Architectural Clock Gating can be combined with DC inserted clock gating to reduce the gate count overhead by 4.44 % with respect to DMA Controller without clock gating.
While functional Power computation a typical scenario is ran for 40 k cycles with average wait cycles between DMA transfers as 1 k cycles. The below table provides the Functional Power consumption with different clock gating techniques.
Table 2: Power Statistics - Functional Power
|S.No. ||Configuration ||Normalized Power (mW) ||Power (%↑) |
|1 ||Typical ||1.0000 || |
|2 ||Typical with DC CG ||0.3718 ||-62.82 |
|3 ||Typical with global and channel CG ||0.4894 ||-51.06 |
|4 ||Typical with global, channel and DC CG ||0.1892 ||-81.08 |
In the Table 2, Row 1 shows the Functional Power consumption of the DesignWare DMA Controller IP without clock gating logic, which is used as the reference for benchmarking. DesignWare DMA Controller with Global and Channel Specific clock gating saves the power up to 51.06 %. The normalized Functional Power consumed is reduced to 0.4894 mW from reference 1.00 mW power consumption. Further the Architectural Clock gating can be combined with Synopsys Design Compiler (DC) inserted clock gating to further increase the power saving to 81.08 %. As mentioned earlier this method of measurement has a drawback that it is dependent on scenario used for power computation. But still it gives a perception of typical power saving.
This context based Clock Gating technique implemented in the DMA Controller, reduces the Idle power up to 95.63 % and functional power up to 51.06 % with an area overhead of 2.38 % and with no performance overhead (w.r.to DMA Controller) in an IoT system. This method can be combined other Low Power techniques to further reduce the total power of the system. This context based Clock Gating technique can be implemented in other sub modules of the SoC to reduce significant amount of power. The DesignWare DMA Controller IP from Synopsys supports the Context based clock gating and is optimized for high performance, low power, and low area. The availability of AXI/AHB interface provides flexibility for customers to choose the right solution based the performance need and application.
 Low Power Methodology Manual For System-on-Chip Design, Michael Keating, David Flynn, Robert Aitken, Alan Gibbons, Kaijian Shi
 Low Power Design Methodologies, Jan M. Rabaey, Massoud Pedram
 Understanding Your Power Profile from RTL to Gate-level Implementation