As designers of high-performance systems labor to achieve higher bandwidth while meeting critical timing margins, one performance bottleneck standing in their way is the memory interface. Double-data-rate SDRAMs and quad-data-rate SRAMs use source-synchronous interfaces through which data and clock (or strobe) are sent from the transmitter to the receiver. The clock is used within the receiver interface to latch the data. This eliminates interface control issues, such as the signal time of flight between the memory and the FPGA, but it raises fresh challenges that designers must address.
One key issue is how to meet the various read-data capture requirements to implement a high-speed interface. As the data-valid window becomes shorter, it becomes more important, and at the same time more challenging, to align the received clock with the center of the data. A dynamic calibration scheme should be used to adjust clock and strobe phase relationships and to center the FPGA clock to the read data.
The traditional method used by FPGA, ASIC and ASSP controller-based designs employs a phase-locked-loop or delay-locked-loop circuit that guarantees a fixed phase shift or delay between the source clock and the clock used for capturing data. The obvious drawback here is the delay is fixed to a single value and predetermined during the design phase. Thus, hard-to-predict variations within the actual system-caused by different trace routings to different memory devices, variations between FPGAs and system conditions such as process, voltage and temperature-can easily create skew whereby the predetermined phase shift is inaccurate.
New silicon features, along with hardware-verified reference designs made available by the leading FPGA vendors, have overcome those challenges. Additionally, engineers must follow some basic rules to improve design cycle time.
- Use the latest FPGA silicon features to construct the interface. Doing so will reduce FPGA logic resource utilization, optimize power consumption and improve timing margins. I/O silicon features-such as adjustable input delay taps with resolutions of 75 picoseconds-enable precise clock-to-data centering.
- Use a dynamic calibration scheme to adjust clock and strobe phase relationships and center the FPGA clock to the read data. This provides run-time adjustments that compensate for any system variations that cannot be accounted for at design time.
- Use the hardware-verified reference designs provided by leading FPGA vendors. Reference designs can be the starting point of your own custom design, saving valuable time and resources.
- Verify compliance to the simultaneous-switching outputs based upon the pc board and the FPGA design. Use new FPGA packages with evenly distributed power pins that reduce SSO noise by significantly improving the signal return current path. This technique also allows wider data buses.
- Run Ibis simulations to ensure the quality of the signals. This will help you choose and adjust the termination for different signals. Run the simulations using the actual pcb layout to integrate the effects of crosstalk, decoupling, terminations and trace configuration in the analysis.
- Use fixed phase shift delays to center the clock or strobe to the data-valid window for read cycles. For high data rates, it may cut into your design margins due to system variations (process, voltage, temperature) that can't be accounted for at design time.
- Skip the functional and post-place-and-route simulation steps. The time invested in these steps usually can be recovered several times during hardware debug. Furthermore, post-layout simulations are a great tool to debug the interface when the highest performance is desired.
- Choose a random pinout, but use experience and common sense in selecting one. Frequently, keeping the data bits grouped and within one or two clocking regions produces good results. Also consider the mapping of the interface in the die of the FPGA. It should be closer to the area where the interface is implemented, reducing internal routing delays.
- Assume that the drivers have an impedance of 0 ohms. More loads on the bus means higher signal integrity constraints. For deep interfaces, consider using several registered DIMMs to achieve the desired memory depth (the load on address nets for registered DIMMs is only one, compared with as many as 18 for unbuffered DIMMs).
- Create discontinuities and obstacles in the return path across the interface in the pcb layout. A discontinuity will cause the return current to take a longer path and will create unwanted noise in the system.
Olivier Despaux (email@example.com), product applications engineer at Xilinx Inc. (San Jose)