How To Interface DDR-II SRAMs with Stratix II Devices
EE Times: Design News How To Interface DDR-II SRAMs with Stratix II Devices DDR-II SRAM devices offer enhanced timing margin and flexibility. This article is a step-by-step guide to interfacing DDR-II SRAMs with Stratix II devices for high-bandwidth communications, networking, and DSP applications. | |
Anuj Chakrapani, Senior Applications Engineer, Cypress Semiconductor (09/06/2005 1:01 PM EDT) URL: http://www.eetimes.com/showArticle.jhtml?articleID=170700718 | |
Synchronous static RAM (SRAM) architectures are evolving to support the high-throughput requirements of communications, networking, and digital signal processing (DSP) systems. Previous Sync SRAM architectures such as Std Sync and NoBL SRAM were limited by bandwidth and could not cope with the high-throughput requirements of high-speed applications. DDR-I and DDR-II SRAMs that are a part of the Quad Data Rate (QDR) SRAM family, however, are ideally suited for high-speed networking applications. They support higher memory bandwidth by providing more than two times the bandwidth of prior Sync SRAM architectures by accepting data transactions on both edges of the clock. This makes them an enticing technology for applications that require data transfer at very high speeds. This article describes how to implement a DDR-II SRAM memory device with a Stratix II FPGA, including detailed timing analysis. DDR-II SRAM Overview DDR-II SRAM devices use the 1.5V-HSTL or 1.8V-HSTL Class I/II I/O standard. However, it is recommended to use the 1.8V-HSTL Class I I/O standard for maximum performance in Stratix II devices. DDR-II SRAM Functionality Burst-of-2 DDR-II SRAM Devices The figures below illustrate write and read operations with the device operating in dual clock mode (i.e., optional C and Cn clocks used). If the device is used in dual clock mode, timing parameters would be with reference to C/Cn, while in single clock mode (i.e., C and Cn clocks not used), timing parameters would be with reference to K/Kn. The size of the Address, Data-I/O buses depend on the memory device with which the FPGA interfaces. The BWSn signal (used to control byte-level operations) is low for the entire cycle of Figure 2. Write Cycle On the rising edge of the K clock, the DDR-II SRAM device latches the control signals R/W and LD and the write address A2 (Cycle 6 of Figure 2). On the next rising edge of the K clock, the DDR-II SRAM device latches the lower data word (DA2) on DQ and on the subsequent rising edge of the Kn clock, the device latches the upper data word (DA2+1), thus completing a write cycle. Read Cycle On the rising edge of the K clock, the DDR-II SRAM device latches the control signals R/W and LD and the read address A0 (Cycle 2 of Figure 2). After a one-and-a-half-clock-cycle latency, the rising edge of Cn clocks out the lower data word (QA0) of address A onto the DQ bus and the upper data word (QA0+1) on the next rising edge of the C signal, completing the read cycle. Burst-of-4 DDR-II SRAM Devices Write Cycle The DDR-II SRAM device latches the control signals LD and R/W and the write address A2 (See Cycle 8 of Figure 3) on the rising edge of the K clock. On the following K clock rising edge, the DDR-II SRAM device latches the first data word (DA2) on DQ. On the next Kn clock rising edge, the second data word is latched (DA2+1). The third (DA2+2) and fourth (DA2+3) words are latched in on the subsequent K and Kn clock rising edges, respectively, completing a write cycle. Read Cycle The DDR-II SRAM device latches the control signals LD and R/W and the read address A0 (Cycle 2 in Figure 3) on the rising edge of the K clock. After a one-and-a-half-clock-cycle latency, the rising edge of Cn clocks out the first data word (QA0) of address A0 onto the DQ bus. The next rising edge of C clocks out the second data word (QA0+1). The subsequent rising edges of Cn and C clock out the third (QA0+2) and fourth (QA0+3) words, respectively, completing a read cycle. DDR-II SRAM Interface Signals Clock Signals The positive input clock, K, is the logical complement of the negative input clock, Kn. Similarly, C and CQ are complements of Cn and CQn, respectively. The DDR-II SRAM device uses the K and Kn clocks for write accesses and the optional C and Cn clocks for read accesses, if used. CQ and CQn are the source synchronous output clocks from the DDR-II SRAM device to accompany the read data. The number of loads that the K and Kn clocks drive affects the switching times of these outputs. When a controller drives a single DDR-II SRAM device, C and Cn are unnecessary because propagation delays from the controller to the DDR-II SRAM device and back are the same. To reduce the number of loads on the clock traces, DDR-II SRAM devices also have a single-clock mode, where the K and Kn clocks are used for both reads and writes. In this mode, the C and Cn clocks are tied to the supply voltage (VDD). The DDR-II SRAM device still uses CQ and CQn for the echo clock from the memory device to the Stratix II device. The Stratix II device outputs the K and Kn clocks and the data, address, and command lines to the DDR-II SRAM device. For the controller to operate properly, the write data (D), address (A), and control signal (R/W, LD, BWSn) trace lengths and their propagation times should be approximately equal to the trace lengths of K and Kn clocks and their propagation times. If the propagation delays for K and Kn from the FPGA to the DDR-II SRAM device are equal to the delays on the address (A) signals, the signal skew effect on the write and read request operations is minimized. The delay matching between write data (D) and K/Kn clocks is achieved by using identical double date rate output circuits to generate the clock and write data inputs to the memory. The DDR-II SRAM device generates echo clocks CQ and CQn, which are edge-aligned with the leading edge of the read data. The CQ and CQn signals are then phase-shifted inside the Stratix II device and used to capture the read data. The CQ and CQn signal board trace length between the DDR-II SRAM device and the controller should be equal to the data I/O (DQ) board trace length to minimize the skew between the two signals. For Stratix II interfaces to DDR-II SRAMs, connect the CQ and CQn pins to the FPGA DQS and DQSn pins, respectively. Both phase-shifted CQ and CQn signals are used to capture the read data. The CQ pin is connected to the input latch and the active-high input register, while the CQn pin is connected to the active-low input register. For best data alignment, invert the CQ and CQn signals before they arrive at the DQ IOE registers. This option can be selected in the altdq megafunction. See the External Memory Interfaces chapter of the Stratix II Device Handbook for more information www.altera.com (Volume 2 Chapter 3). Use regular I/O pins in Stratix II I/O banks 3, 4, 7, or 8 via the double data rate (DDR) registers to generate the K and Kn clocks. To meet the DDR-II tKHKH(skew between K and Kn) requirement, use adjacent pins for the complementary signals and surround the pin-pair with programmable VDD and ground pins for better noise immunity. Data Signals Control Signals Address Signals DDR-II SRAM Interface Architecture Datapath Architecture in Stratix II Figure 4 depicts the memory interface datapath architecture. Specifically, it indicates how to connect the clocks, data, address, and control pins in Stratix II devices when interfacing with DDR-II SRAM devices. The write PLL generates two clock outputs, WRITE_CLK and WRITE_CLK_90 that have a 90 phase offset. The WRITE_CLK output is used to clock out the address, command, and data signals to the DDR-II SRAM, while the WRITE_CLK_90 output is used to generate the K/Kn memory input clocks. This architecture centrally aligns the K and Kn write clock edges to the output data (D) and address (A) signals. Write data outputs to the memory and the clocks use the double-data rate registers or DDIO circuitry in the IO cell, significantly minimizing the skew between clock and data channels. The read DQS phase shift circuitry generates a centrally aligned version of CQ and CQn echo clocks for read data capture. The captured data can then be resynchronized to the system clock. For more information on how to select the correct resynchronization clock phase, see the appendix Resynchronization of Read Data to the System Clock in the QDRII SRAM Controller MegaCore' Function User Guide www.altera.com. Timing Analysis Write Cycle Timing The FPGA controller drives both the DDR-II SRAM clock and data signals. The board delays for the clock and data (DQ) lines may not be equal and hence, to offset any mismatch in trace lengths, a factor of 50 ps is considered in the clock-to-output delay calculations. Because K and Kn are generated from the WRITE_CLK_90 signal, while data and address are generated from the WRITE_CLOCK signal, there is a timing margin of approximately one-half of the bit period (the length of time between each data bit) each way to meet the DDR-II SRAM device set-up and hold times. The bit period, by definition, is approximately one-half of the cycle time for double data rate signaling. In addition to set-up and hold times, an additional concern is the clock-to-clock skew between K and Kn (tKHKH). The 267-MHz DDR-II SRAM specification calls for a minimum 1.8 ns delay between the rising edges of the K and Kn signals. Because Stratix II device clock-to-out times can vary with pin position, K and Kn need to be placed on adjacent pins and their tCO times need to be verified to meet this requirement. For better noise immunity, it is recommended to surround the pin pair with programmable VDD and ground pins. In the following exercise, we analyze the timing for a write operation from a Stratix II EP2S60 device to a Cypress CY7C1518AV18-267 burst-of-2 267-MHz DDR-II SRAM device. Let us start the timing analysis by studying the input clocks K and Kn. These clocks are generated by the WRITE_CLK_90 output of the PLL inside the FPGA. The data, address, and command outputs are clocked out by a different output of the same PLL. Since two outputs of a PLL feeding global clock networks have an inherent skew, the K and Kn clocks could be offset from the data outputs by this amount. For the Stratix Enhanced PLLs, skew between two PLL outputs using different counters is 150 ps. This specification is listed in the DC and Switching Characteristics chapter in the Stratix II Device Handbook (Volume 1 Chapter 4). Figure 6 illustrates this and other uncertainties on the clock and data signals This results in a minimum phase offset between these two clocks: TSHIFT_MIN = (0.25 * clock period) clock skew = 0.25 * 3750 150 = 787.5 ps Similarly, the maximum phase offset between the two PLL output clocks: TSHIFT_MAX = (0.25 * clock period) + clock skew = 0.25 * 3750 + 150 = 1087.5 ps In addition to this clock skew uncertainty, PLL outputs can have duty cycle distortion (DCD) up to 5% of the clock period. This results in an additional clock uncertainty of 187.5 ps (5% of 267-MHz clock). Another source of uncertainty on the clock is PLL jitter. However, since PLL jitter affects both the clock and data outputs to the memory uniformly, it does not affect the set-up/hold relationship on the DDR-II SRAM. In Figure 6, for example, if the ideal clock edge of WRITE_CLK_90 is expected at time t = 3750 ps. After accounting for PLL output clock skew and duty cycle distortion, the clock edge can occur anytime between t = 3412.5 ps and t = 4087.5 ps. Next, we compute the uncertainties on the data (D) signals. Channel-to-channel skew among all data pins is equal to the worst-case skew between the DDR outputs within the I/O bank(s). When using a single column I/O bank in the EP2S60 devices, the worst-case skew is tIOSKEW = 160 ps. Additionally, board trace length variations could add to this channel-to-channel skew. While this implementation calls for perfectly matched trace lengths, the timing analysis allows for 50 ps of board skew. These skew parameters affect the data valid window on the DDR-II memory and reduce it by 420 ps. Now that the uncertainties are established, we check the set-up and hold time margins for write operations at the memory input pins. For a 267-MHz operation, the bit period is (3750 ps/2) = 1875 ps. The Cypress DDR-II SRAM device has set-up and hold time requirements of 350 ps at this speed. Given these parameters, the set-up and hold margins for 267-MHz DDR-II in Stratix are as follows: Set-up time margin is the least when the data arrives late and the clock arrives early. Set-up time margin is calculated as: TSU_MARGIN = tSHIFT_MIN t,DS tDCD tIOSKEW tEXT = 787.5 350 187.5 160 50 = 40 ps Hold time margin is the least when the data arrives early and clock arrives late. The margin is calculated as: TH_MARGIN = tCK / 2 tSHIFT_MAX tDH tDCD tIOSKEW tEXT = 1875 1087.5 350 187.5 160 50 = 40 ps The total margin available is the sum of the set-up and hold margins = 80 ps. Table 2 shows timing margins of a Stratix II EP2S60 interfacing with 267-MHz and 250-MHz DDR-II SRAMs for write operations when the board trace variations for the DQ and K/Kn pins are 50 ps (approximately 0.3” of FR4 trace length variations). A similar timing analysis for other interfaces can be performed with a different FPGA and DDR-II SRAM device combination by replacing timing specifications from the corresponding data sheets. Read Cycle Timing Stratix II Read Cycle Timing DDR-II memory reads in Stratix II devices are implemented using the CQ echo clock output from the DDR-II SRAM. The CQ echo clock signal is directly fed into a DLL to centrally align the clock with the input data (DQ). This is achieved by implementing a phase shift on the DLL and using this phase-shifted clock to latch data from memory in the DDIO registers. In the following exercise, the timing for a read operation from a Cypress CY7C1518V18-267 burst-of-2 267-MHz DDR-II SRAM device to the Stratix EP2S60 device is analyzed. Start the analysis by studying the relationship between the echo clocks (CQ, CQn) and read data (DQ) signals from the DDR-II SRAM. For the CY7C1518V18-267, the data clock-to-output (tCQD) and data hold times (tCQDOH) with respect to the echo clocks are 300 ps and 300 ps, respectively. Hence the data valid window at the DDR-II SRAM device pins is 1275 ps (1875 300 300). Figure 8 illustrates these delays and other uncertainties in a read cycle timing waveform. Board trace delays on the CQ/CQn signals and data bus can be ignored if the trace lengths are matched (=L2 in Figure 4). This timing analysis allows for a maximum board skew of 50 ps between these lines. Due to this skew, the data valid window is further reduced to 1175 ps. The next step is to analyze the set-up and hold margins for latching the read data (DQ) signals at the FPGA's DDR input pins. The echo clock, CQ, from the DDR-II SRAM is connected to the dedicated reference clock input pin of the DLL. This read DLL phase shifts the clock to centrally align the clock's edges to the data (default phase shift of 90°). Uncertainty is introduced on read clock by the Stratix II DLL in the form of jitter (100 ps). Worst-case set-up and hold time requirements from the Stratix EP2S60 are 210 ps and 180 ps, respectively. These numbers were obtained from Quartus II timing analyzer reports. While performing timing analysis for a specific design, obtain the requirements from the Quartus II timing analyzer. Additional FPGA specifications that need to be taken into consideration are DLL phase shift error and DQS-DQ internal skew. Given these parameters, the set-up and hold margins for 267-MHz DDR-II in Stratix II are as follows: Set-up time margin is the least when the data arrives late and the clock arrives early. Set-up time margin is calculated as: TSU_MARGIN = tDLL_PS tJITTER tPSERR tDQDINT tEXT t Hold time margin is the least when the data arrives early and clock arrives late. The margin is calculated as: tH_MARGIN = tCK/2 tCQDOH tH tEXT tDLL_PS tJITTER tPSERR tDQDINT = 1875 300 180 50 937.5 50 0 80 = 277.5 ps The total margin available is the sum of the set-up and hold margins = 525 ps. Since the hold margin is larger than the set-up margin, the PLL phase shift can be adjusted to balance the margins. An additional phase shift of 15 ps to the existing 90 or 937.5 ps phase shift would result in equal margins. This amounts to a total real PLL phase shift of 91 on the echo clock Table 3 features the DDR-II SRAM read timing margin analysis at 267 MHz when the board trace variations for the Q and CQ/CQn pins are 50 ps (approximately 0.3” of FR4 trace length variations). A similar timing analysis can be performed for an interface with another FPGA-DDR-II SRAM device combination by replacing timing specifications in Table 3 with those from corresponding data sheets. Design Guidelines I/O Standard and Termination Impedance Matching Trace Lengths Clamshell Configuration Conclusion | |
All material on this site Copyright © 2005 CMP Media LLC. All rights reserved. Privacy Statement | Your California Privacy Rights | Terms of Service | |
Related Articles
- How to cost-efficiently add Ethernet switching to industrial devices
- How Efinix is Conquering the Hurdle of Hardware Acceleration for Devices at the Edge
- Securing UART communication interface in embedded IoT devices
- Overcoming Timing Closure Issues in Wide Interface DDR, HBM and ONFI Subsystems
- How throughput enhancements dramatically boost 802.11n MAC efficiency--Part II
New Articles
Most Popular
E-mail This Article | Printer-Friendly Page |