Sunny Gupta, Reecha Jajodia (Freescale Semiconductor India Pvt. Ltd.)
Parallel Data transfer using digital hardware involves a significant amount of switching power consumption, which increases as “data activity” increases.
Higher power consumption leads to
- Overdesign of power grid
- Increased thermal dissipation and need for better packages for SoCs
- Larger battery capacity for handheld devices
- Overhead of cooling systems
The motivation was to reduce switching noise in areas where parallel data packet transfer is involved, e.g., reading sectors of Flash memory, inter-processor communication, etc
The Idea:
Here, we propose an algorithm which optimally rearranges the data words in a packet of data such that signal state change on each bit of the data words is minimized.
The high level flow for the proposal is as follows, further explained below:
- Collect the packet of data in a memory array and assign “internal address” to each data word in the packet in the original order
- Apply the proposed algorithm to re-order the “data words” while minimizing data activity
- Append the “internal address” to the data words, and transmit the data in the order found from the algorithm
- At the receiver, using the “internal address”, rearrange the data packet to generate the original content
This helps significantly reduce transmit and receive power. This method is useful when the “processing power” to implement the ordering and re-ordering algorithms is significantly lower than original transmission power.
The next section describes some nomenclature that is required to explain the proposal in details.
K MAP Nomenclature:
- State is a vector made of signals. If we have N signals, we can have 2^N states, out of which only a fraction, V states need to be excited / traversed. Example for N=4 (4-bit signals) are 0000 (0), 1011(11) etc.
- 1-D neighbor of a parent state – It is defined as all the states that can be represented by changing just one bit of the previous state.
- Example -
- Say the parent state is 0000 (0)
- Then 1D Neighbors are 0001(1), 0010(2), 0100(4), 1000(8)
- Hamming Distance – The number of bits that need to be changed to reach one state from previous state.
- Say the parent state is 0000 (0)
- Then 1100 is 2 Hamming Distance away.
Sorting Algorithm
Proof of Concept : Smart GRAY Code Algorithm
Original sequence is 2, 11, 9, 5, 8, 1 and 10, and required vectors: 0010, 1011, 1001, 0101, 1000, 0001, 1010 out of the possible 16 combinations for 4-bit vectors. Transferring this original sequence requires a total of 13 bit toggling
- Basic Steps of the Algorithm
- First block the combinations NOT needed (RED)
- Find the points with maximum 1-D neighbors. These are 9 and 10 with 3 neighbors each
- Start from 9 till no more 1-D neighbor. This gives us the path 9->11->10->2 (we did NOT choose 9->11->10->8 since it terminates in 8, which is a 1-D neighbor of 9, the root state)
- Jump to 1->5
- The final sequence becomes :
We get 2 sequences with 1 hamming distance transition(min toggling) which are concatenated to generate the final stimuli
Proof of Concept
- Consider a 64-bit data transfer protocol with a packet size of 4096 words, which results in a packet size of 16KB
- We’ll minimize toggling on 64 “data bits” by using the algorithm, and add 12 “address bits” to maintain sequence information, and these 12 bits can change randomly
- If we don’t need to retain sequence information, we would NOT need the “address bits”. Example can be a “memory sector read”, where we can club data and address into a single word, and then apply algorithm on these composite words, communicate the optimized composite packet and then simply extract data-address pairs (no other work needed at receiver)
- Hence, we transmit a total of 76 bits in this case, out of which 64 bits are nearly static (only a very few bits toggle at a time)
- Considering an original “activity factor of ~40%”, i.e., 25 bits on average toggle across each word out of 64, we can get down to about 12 bits toggling at a time and hence reduce switching power by as much as 50%.
Conclusion
- The concept is not about changing the data to serial data. We just rearrange the data and append a few encoding bits. This rearrangement reduces power and after transmission we decode the data to get back the original data
- This idea talks about being in normal/run mode always, never to enter the low power mode. We instead reduce the activity while switching and thus save power.
- It intelligently rearranges the transfer data so that there is minimum number of bits changing from one data word to next data word. Thus reducing the switching power losses.
- Rearrangement at receiver is a trivial operation (just random access), and hence this scheme is very beneficial in case of data broadcast (since energy is spent only ones during transmission but multiple receives benefit)