By Raj Mahajan, MemCore Inc.
The variety of memory devices available today provides the system architect with multiple options when selecting a memory. The selection is usually driven by key considerations- power, speed, device configurations, cost and pin count. Functionality differences can play a big roll in optimizing system level performance, cost and power. A detailed understanding of the impact of various memory features on the overall system can be critical when selecting the right mix of memory subsystem components. This paper will review popular DDR memory device architectures and provide selection criteria to pick the right memory device based on system design objectives. This paper will include a short review of the key features of DDR, GDDR and MobileDDR memory architectures, covering power, speed and cost characteristics as well as key functionality differences that can impact overall system architecture. Using real system design experiences each of the main memory architectures will be used to address system design challenges of sustained bandwidth, reliability, access priority, power savings, and interface requirements. By using real world example designs the reader will learn the key elements of the selection criteria that can be used to guide the selection of a specific memory architecture to solve specific system level application requirements. Introduction
The variety of memory devices available today provides the system architect with multiple options when selecting a memory. The selection is usually driven by key considerations- power, speed, device configurations, cost and pin count. Functionality differences can play a big roll in optimizing system level performance, cost and power. A detailed understanding of the impact of various memory features on the overall system can be critical when selecting the right mix of memory subsystem components. This paper will review popular DDR memory device architectures and provide selection criteria to pick the right memory device based on system design objectives. This paper will include a short review of the key features of DDR, GDDR and MobileDDR memory architectures, covering power, speed and cost characteristics as well as key functionality differences that can impact overall system architecture. Using real system design experiences each of the main memory architectures will be used to address system design challenges of sustained bandwidth, reliability, access priority, power savings, and interface requirements. By using real world example designs the reader will learn they key elements of the selection criteria that can be used to guide the choice for a specific memory architecture to solve specific system level application requirements.
The variety of memory devices available today provides the system architect with multiple options when selecting a memory. The memory standards discussed in this paper, but by no means a complete list, is DDR, DDR2, GDDR and MobileDDR. The DDR designator means Double Data Rate- where data is clocked on both edges of the clock. This innovation allows for higher bandwidth that a traditional synchronous (positive edge only) interface, but requires a bit more complexity and tighter signal integrity guidelines during layout that traditional synchronous systems require. The G in GDDR stands for Graphics and indicates the specialized nature of the GDDR standard. MobileDDR is pretty clear- DDR memories targeted at Mobile, or low power applications. Each standard has some basic similarities, but come with features geared to the focus application area. Usually a memory sub-system needs to only support one specific memory standard, but in some cases multiple standards need to be supported. One example would be a circuit board that can be shipped in either a low-performance, battery operated mode (MobileDDR) or in a non-battery operated, higher performance mode (DDR2). In this paper we will assume that the target system is single use only. Dual mode systems are more complex and would require a paper of their own.
A Comparison of DDR and DDR2
DDR and the more recent upgrade to DDR2 are the workhorses of the memory world. These general purpose, high-performance standards, used in all varieties of equipment, are by far the most common. The other standards, GDDR and Mobile DDR can be considered special versions of the DDR or DDR2 standard. We will start out with a short description of DDR and DDR2 to build a common baseline for later descriptions of the other standards.
Memory Module Technology
DDR2 memory modules can have from 256MB to 4GB capacities. All DDR2 memory chips use the FBGA (Fine Ball Grid Array) package. This allows higher memory densities in a smaller space with better electrical and thermal properties. The older package associated with DDR is the TSOP-II package, shown on the lower memory module in Figure 1 below. DDR2 memory uses 1.8 Volts for power, instead of the 2.5 volts used by DDR memory, and should result in lower power and cooler operation.
Figure 1: DDR2 Memory Module (Top) and DDR Memory Module (Bottom)
DDR2 will starts with a 400MHz speed grade, the top speed of DDR. DDR2 speed grades extend to 533MHz, 667MHz and top out at 800MHz. Speed grade is just one of the system performance parameters however. DDR2 adds some features to get as much system bandwidth as possible out of the raw clock rate.
DDR2 memories pre-fetch 4 bits of memory at a time, as compared to DDR's 2 bits. Each clock cycle, 4 bits of data are available to the memory I/O buffers for transfer onto the system memory bus (which transfers data twice each clock cycle) while normal DDR memory transfers 2 bits at a time. The DDR2 memory core operates at half the DDR equivalent rate however, so that the overall transfer bandwidth is the same, but power is reduced. The illustration in Figure 2 shows the resulting bandwidth and performance differences within the memory device. DDR, shown at the top of the figure has a memory array running at 200MHz and accesses 2 bits each time. This supplies enough data to the I/O buffers to achieve the DDR performance of 400MHz on the system data bus. The DDR2 implementation is show on the bottom of Figure 2. Now 4 data bits are accessed at 100MHz. This delivers the required data rate to achieve the 400MHz system data bus requirement.
Figure 2: DDR Memory Architecture (Top) and DDR2 Memory Architecture (Bottom)
On Die Termination
On Die Termination (ODT) is built into each DDR2 memory chip on a module, with a pin dedicated to dynamically controlling the ODT. In DDR the terminating resistors needed to be built into the motherboard.
This change allows ODT to be placed optimally for any situation (read or write, to/from any rank in the system). This, in turn, reduces noice to improve electrical performance.
Posted CAS and Additive Latency
In a DDR memory a memory bank activation and a read command can occur on the same clock cycle, creating a collision, and forcing the activation command to be delayed by one clock cycle. This results in the read command being delayed one clock cycle, and so on. This creates a gap in memory transfers and reduces memory and system bandwidth. With posted CAS and additive latency, a read command is issued immediately after the activate command. The read command is delayed internally by a predetermined number of clock cycles, without an additional command, so that no collision occurs. This happens even if another activate command is issued on the same clock cycle the read command eventually executes on. This eliminates the gaps present in DDR and improves overall system performance.
DDR2 allows for CAS latencies of 3, 4 or 5, as compared to DDR's 1.5, 2 and 2.5. All 533MHz DDR2 chips use a Latency timing of 4-4-4. Write latency is also longer with DDR2. DDR allows a single cycle for write latency while DDR2 defines write latency as read latency-1. In some cases this might impact overall system bandwidth, but most memory controllers can find ways to keep data going into and out of the memory at close to full bandwidth even with this an increase in latency.
DDR and DDR2 both use a data strobe signal that travels along with the data. This strobe signal provides the timing information required to run the memory interface at full speed. Because it is required to have tight skew and delay characteristics, with respect to the data, the memory controller PHY can use it to ‘lock’ on the center of the data ‘eye’. This squeezes the last drop of performance out of the interface and provides the highest system bandwidth possible.
A summary table is shown in Table 1 below and lists the key features of DDR and DDR2.
| || DDR ||DDR2 |
|Data Rate ||200-400Mbps || 400-800Mbps |
|Interface || SSTL_2 ||SSTL_18 |
|Source Sync ||Bi-directional DQS (Single ended default) ||Bi-directional DQS (Single/Diff Option) |
|Burst Length ||BL= 2, 4, 8 (2bit prefetch) ||BL= 4, 8 (4bit prefetch) |
|CL/tRCD/tRP ||15ns each || 15ns each |
|ODT ||No ||Yes |
|Driver Calibration || No || Off-Chip |
Table 1: Key DDR and DDR2 Features
GDDR Memory Standards
The Graphics DDR (GDDR) memory standards all focus on delivering higher performance than their cousins, standard DDR. GDDR standards have evolved from GDDR1, GDDR2, GDDR3 to GDDR4 in the endless search for higher performance and bandwidth. Because this higher performance comes at higher cost (both in dollars and in power) GDDR is targeted only for high performance applications, like graphics adaptor cards where the higher cost and higher power are acceptable. You won’t find GDDR devices being used as main memory in standard personal computers and laptops however. DDR devices have the right power and cost trade-off for that portion of the market.
GDDR devices diverge from the standard DDR devices in only a few respects. GDDR3 devices operate at 2.0V. This additional voltage allows them to operate faster than their DDR equivalent cousins. GDDR devices have wider data interfaces to provide more bandwidth per device. This keeps the number of devices required for a typical graphics system low and mitigates, somewhat, the higher power requirement. The capacity of GDDR memory devices tends to be reduced in comparison to DDR devices. For example, DDR3 devices are specified from 512Mb to 8Gb. GDDR devices are typically 256Mb to 512Mb. Because of the higher power required by GDDR devices capacity must be reduced in comparison to the general purpose DDR devices.
We can see then that GDDR devices are best targeted at applications where power, capacity and cost are less important than raw performance and overall bandwidth. GDDR devices pack a lot of bandwidth in a small footprint if that is the most important requirement.
MobileDDR memory devices are optimized for low power operation in battery operated and handheld devices. The fundamental operation of MobileDDR devices is very similar to their DDR cousins, but a few key features have been added to make them more power friendly.
One of the key features that differentiates MobileDDR devices from DDR devices is the Partial Array Self Refresh feature. This is programmed into the GDDR device by setting the appropriate bits in the Extended Mode Register Set (EMSR). The Partial Array Self Refresh (PASR) feature can be set to either full array, half array or quarter array. This allows a device, which only has active data in a portion of the array to reduce power by only operating a sub-set of the full device.
In addition to PASR the MobileDDR device provides another refresh option. The Temperature Compensated Self Refresh (TCSR) feature controls the internal refresh rate based on the temperature of the device. The refresh rate will be adjusted for temperatures of 45 degrees centigrade or 85 degrees centigrade. This cuts the refresh rate to about 50% and when combined with PASR can reduce power substantially.
Finally, the Deep Power Down (DPD) mode of operation reduces power to the minimum. In typical operation modes, with one bank active power consumption is usually less the 80ma. When refresh is active power is about three times as much as during normal operation. (This is why the PASR feature is attractive for saving power- the large refresh current can be reduced). In DPD mode current is reduced to only 10uA. This is a key feature of MobileDDR memories and you can see the big advantage these devices have in battery operated systems.
Summary of Memory Characteristics
Each of the three main memory standards- DDR, GDDR and MobileDDR have specific strengths and weaknesses that are targeted at specific applications. The key strengths and weaknesses are summarized in Table 2 below.
DDR memories are targeted at the bulk of memory applications and focus on delivering medium performance, cost, power and data width. GDDR memories, in contrast, focus on high performance and data bandwidth. This means that power and cost end up being higher as well. The market for high performance memory is applications like graphics adaptors is large enough that this trade off makes economic sense. The MobileDDR devices focus on low power and keeping system cost low. Performance is not important, but notice that the medium data width allows applications to typically use a single devicethis is not typically used for higher bandwidth.
| || DDR ||GDDR ||MobileDDR |
|Data Rate ||Medium ||High ||Low |
|Power || Medium ||High || Low |
|Cost || Medium ||High ||Low |
|Data Width || Medium ||High || Medium |
Table 2: Summary of Key Memory Standard Characteristics
Now that the main characteristics of the three memory standards are understood it is possible to look at a few example systems and see what memory technology is the best fit for the application. DDR memories are widely used as personal computer and laptop main memory. The high-performance of GDDR and low power of MobileDDR are not required for this application. Cell phones however are a natural fit for MobileDDR memories and can make good use of the lower power offered by the Self Refresh and Deep Power Down modes of operation. As already described GraphicsDDR memories are a natural fit for graphics adaptors due to the high memory bandwidth requirements of these sub-systems. In other applications it may not be as easy to determine the best fit. Let’s look at two examples and see how we would select between the three different kinds of memory devices.
Video Conferencing System
A video conferencing system takes video and audio inputs and encodes these for transmission over a network. Encoded signals are received by the video conferencing system, perhaps from multiple sources, and decoded for display on the receiving system. Usually these systems can be broken into a video processing sub-system and a network transmission sub-system. This system segmentation allows different network interfaces to be supported with a different adaptor card. In our example system we will assume that such a partition exists and we will focus on the requirements of the video processing adaptor card. A block diagram of a typical video conferencing system is shown in Figure 3 below.
The video input comes from a camera in the video conference facility and provides the video stream to the video processing sub-system. The video sub-system encodes the signal and sends it over the PCI-Express system interface for transmission using the associated network adaptor sub-system. When a video signal is received by the video conferencing network interface the encoded data is forwarded to the video processing adaptor. The adaptor decodes the video signal and displays it on the video monitor or monitors.
The memory system will need to buffer memory to or from the PCI-Express interface and make it available to the video encoder or decoder as needed. Because both encoding and decoding are going on simultaneously the memory will need to support a significant amount of bandwidth. In the worst case scenario the PCI-Express interface could be reading and writing data to the memory system while the video encoder and decoder are both accessing memory. If the PCI-Express interface has a maximum bandwidth of 2GBps (for a bi-directional x4 interface) and the video encoder and decoder each need1GBps, then the memory will need to deliver 4GBps of sustained bandwidth in the worst case.
Figure 3: Video Processing Sub-System
The size of the memory is an important factor as well. If sufficient data needs to be stored to buffer a full second of encoder and decoder bandwidth the size required would be over 2GB. Additional buffer requirements would be required for the PCI-Express interface and this could require another 1GB of storage.
Power is not a key requirement but the adaptor needs to consume less than 35 Watts of power to meet the PCI-Express requirement. If the encoder, decoder and PCI-Express interface require 25 Watts that leaves 10 Watts for the memory.
A summary of the application requirements are shown in column 2 in Table 3 below. The requirements for a DDR, GDDR and MobileDDR implementations are given in the adjacent columns. These entries assume a 3GB memory size since that is the key requirement for all implementations. (The detailed computations for each entry are given in the appendix of this paper).
|Requirements || Application ||DDR2 || GDDR ||MobileDDR |
|Size || 3GB ||4GB/module || 3GB (6 parts) ||3GB (44 parts) |
|Bandwidth || 4GB/sec || 6.4GB/sec ||24GB/sec ||40GB/sec |
|Power ||< 10W ||4.5W ||4.5W ||12W |
Table 3: Video Processing Application Memory Requirements
Ater reviewing the summary the best implementations would be either using the DDR2 or the GDDR solution. The DDR2 solution would cost less since it uses an inexpensive PC memory module. The GDDR solution would cost more and require more board space but would offer significantly higher bandwidth. If the increased bandwidth can’t be used by the application it would be better to go with the DDR2 solution.
Portable Test Equipment
Now that the methodology is understood a similar analysis can be done for a portable test equipment application. The requirements for this example are given in Table 4 below. (The detailed calculations are given in the appendix).
|Requirements || Application || DDR2 || GDDR || MobileDDR |
|Size || 64MB || 256MB/part || 64MB/part || 64MB/part |
|Bandwidth || 1GB/sec || 1GB/sec || 4GB/sec || 1GB/sec |
|Power || < .5W || .8W || .75W || .3W |
After reviewing the requirements table it is clear that only the MobileDDR memory meets all the requirements. The DDR2 and GDDR devices meet all the requirements except the power requirement and so must be excluded from the solution.
This paper gave a short review of the key features of DDR, GDDR and MobileDDR memory architectures, covering power, speed and cost characteristics as well as key functionality differences that can impact overall system architecture. Using two real system design experiences each of the main memory architectures was compared to see which would address the system design challenges associated with each design. These examples demonstrated to the user how to use the key characteristics as selection criteria, guiding the choice of memory architecture to solve application requirements.
Raj Mahajan has more than 10 years of experience architecting, designing, and verifying memory access solutions for advanced ASICs for a variety of target markets. He started his career at Intel Corp, where he architected and designed advanced render cache controllers that shipped hundreds of millions of units in several generations of graphicsenabled PC chipsets. Following that he held a lead design position at 2Wire, Inc., a successful start-up addressing the residential broadband access market, where he led the integration and verification of their flagship SoC, which shipped first silicon. At Ingot Systems he led the architecture, design, and verification of MemCore's flagship memory controller. Raj holds a Bachelor of Computer Engineering from Georgia Institute of Technology.
1) Detailed Computations for Video Processing memory requirements
GDDR Memory Implementation:
- .75W per part (350ma and 2V)
- size is 512Mb per part
- Bandwidth is 4GB/sec per part
DDR2 Memory Implementation:
- 4GB per module
- 2.5A * 1.8V (4.5W)
- Bandwidth is 4GB/sec per module
- 64MB per device
- 150ma * 2V (.3W)
- Bandwidth is 1GB/sec per part (4Bytes at 266MHz)
2) Detailed Computations for Portable Test Equipment memory requirements
GDDR Memory Implementation:
- .75W per part (350ma and 2V)
- size is 64MB per part
- Bandwidth is 4GB/sec per part
DDR2 Memory Implementation:
- 256MB per device
- .4A * 1.8V (.8W)
- Bandwidth is 500MB/sec per device
- 64MB per device
- 150ma * 2V (.3W)
- Bandwidth is 1GB/sec per part (4Bytes at 266MHz)