Making Interconnects More Flexible
By Duncan Bees and Brian Holden, PMC-Sierra, CommsDesign.com
September 10, 2003 (6:57 a.m. EST)
With many system bus alternatives in telecom, storage networking, and datacom applications, chip designers face the prospect of having to support multiple interfaces to meet current and future interconnect requirements. This is particularly true for devices, such as high-speed microprocessors, which find application in all of the above domains. Figure 1: Internetworking developed around an embedded microprocessor with parallel HT interface and parallel-to-serial bridge for connection to PCI-Express and RIO.
To solve this dilemma, designers must build next-generation chip architectures that deliver a flexible interconnect. And, to build flexibility into the interconnect, designers must identify common logical and physical attributes among these interconnects in order to promote portability and interworking.
In this article, we will looks at attributes of several prominent bus interfaces, including PCI, PCI-X, PCI-Express, HyperTransport, parallel RapidIO, and serial RapidIO. Comparisons are made at both logical and physical levels. This article also includes a discussion of interworking scenarios applicable to the networking space.
Coping with Interoperability Demands
To deal with varying interconnect architectures, designers must keep interoperability in mind when building communication and networking chip. Several potential strategies exist to deliver that level of internetworking, including:
- Bridging, in which an external device performs protocol and physical translation.
- Flexible interfaces, in which a device is capable of being configured to support more than one interconnect or bus.
- Multiple versions of a device, in which the chip manufacturer releases chips tailored to support different interfaces.
One example of interworking is shown in Figure 1. In this case, a high-speed embedded microprocessor has a parallel HyperTransport (HT) interconnect optimized for low latency and high bandwidth. A parallel-to-serial bridge is used to connect the processor to a serial backplane, with either PCI-Express (PCI-Ex) or RapidIO (RIO) protocols.
Bridging and other interworking strategies are made simpler by identifying core functions among these interconnects and buses. In some cases, interface parameters can be chosen so as to minimize differences at both logical and physical layers, thus simplifying and lowering the cost to achieve interoperation between different interconnects.
The general characteristics of these system buses and interconnects are shown in Table 1. Of these interfaces, PCI is a well-established, medium performance, general-purpose bus. From its genesis in the personal computer industry, it has gained a solid application niche in many control-plane and low-end data-plane applications in communications, as well as other embedded appl ications. PCI-X is an extension of PCI, which follows the essential structure of the PCI bus, but adds logical and electrical enhancements, which help to alleviate bandwidth and efficiency limitations. In higher speed versions however, PCI-X is restricted to a point-point bus because the electrical connectors defined for PCI do not support high-speed shared bus configurations. Although PCI-X is widely used in servers, it is not expected to play a major role in communications platforms, where full backward compatibility with PCI is not required.
Table 1: General Characteristics of Interconnects and Buses
In comparison, PCI-Ex, HT, and RIO comprise a new class of system interconnect of potential interest to telecom and datacom vendors. These interconnects are designed with flexibility, well-layered logical structures, scaleable bandwidth, and high-speed, pin-efficient differentia l I/O. These characteristics make them candidates for embedded applications like telecom and datacom, where requirements include high bandwidth, delivery of various data types with minimal latency, and configurations ranging from chip-to-chip to backplanes.
Figure 2 shows a block diagram that could represent the end-point implementation of one of these interconnects. The layerings of PCI-Ex, HT, and RIO are well defined and they follow a broadly similar structure; an implementation structured according to transaction, data link, and physical layers (PHY) follows.
Figure 2: Interconnect block diagram.
In Figure 2, a FIFO interface to the application layer transfers transaction layer packets between the end-point's application layer and the interconnect block. Within the interconnect block, transaction layer information such as header cyclic redundancy check (CRC) and seq uence number, and data link flow control are applied. At the PHY layer, packets are converted to a byte or bit sequence to which encoding and other operations are applied.
In general, the transaction layer semantics are fairly similar between HT, PCI-Ex, and RIO interconnects. It may be possible to hide the details of the interface from an application client by constraining logical parameters of its interface to the system interconnect. At the PHY layer, both PCI-Ex and serial RIO use serializer/deserializer (serdes)-based interfaces. HT and parallel RIO use a similar parallel data bus, separate clock signals, and differential electrical signals.
Comparing Transaction, Data Link Layers
We are now ready to compare the properties at the transaction and data link layers of these buses. While there are significant differences, there are also surprising similarities that can be exploited to achieve a flexible interface. To find these similarities, designers must evaluate the number of outst anding transactions, multiple priority support, multiple flows between source-destination pairs, ordering models, coherency support, and data link-layer properties. Let's look at each of these six characteristics in more detail.
1. Number of Outstanding Transactions
An important property of any interface is the number of outstanding transactions supported. The higher this number, the more system concurrency is available to the programmer (at the cost of increased buffering). The number of outstanding transactions varies by protocol. A reasonable design guideline is to support from 16 to 32 outstanding requests. This number is allowed by all of these buses and interconnects and provides good flexibility for programmers at reasonable buffer cost.
2. Multiple Priority Support
Another capability that the system interconnects share is the ability to support multiple pr iorities or classes of traffic within the interconnect fabric. While the specific mechanisms and their intents vary substantially, two priorities of traffic classes can be supported in all the interconnects. Also, switch fabric implementations may not support the full set of virtual channel capabilities. With these factors in mind, the designer of a flexible interconnect may wish to support a mode that presents only two levels (high and low) of priority to the application.
3. Multiple Flows Between Given Source-Destination Pairs
PCI-X and HT directly support re-ordering of multiple flows between given source-destination pairs to increase the concurrency and reduce blocking between source-destination flows. (While RIO's FlowID concept is somewhat similar, its value is limited because the logical FlowID label is translated to one of only four priority levels.) It is advisable to use these m echanisms to increase performance, but also to provide a lower-performance mode that does not require them in the interest of designing a flexible internal bus protocol.
4. Ordering Models
The ordering model that a load-store bus supports is central to the way an application uses it. The producer-consumer model is described in Appendix E of PCI Conventional 2.3 and is the most widespread and accepted model. It provides a model where a producer and a consumer located anywhere in the system can communicate with each other with reliable results if they follow a certain set of rules.
PCI-Ex and HT support the full producer-consumer model. With RIO, the “flag” and “data” as described in the PCI specification may need to be co-located on the same side of the bridge, potentially limiting a system designer's flexibility
A related topic is whether the specification supports a dedicated posted request channel. There are certain deadlock scenarios documented in PCI Conventional 2.3 Appendix E, rules 5 and 7 that only occur with a minimum of 3 concatenated bus bridges. Appendices C.5.1 and C.5.3 of HT 1.05 provide an alternative description. Of the five buses considered here, only RIO does not support a posted request channel. Consequently, RIO applications may be limited to less complex bridging scenarios.
We recommend that the internal bus interface designer support the PCI-X ordering model including the posted write channel and the transaction passing rules. This will allow considerable flexibility when connecting to and bridging between the various interfaces.
5. Coherency Support
Support for I/O coherency is important to simplify system software. A well functioning I/O coherency mechanism frees the software designer from having to orchestrate the flow of dat a around the system.
All protocols considered here vary widely in their precise mechanism, but each supports I/O coherency well. When designing a flexible internal bus interface, it is reasonable to assume that I/O coherency will exist and that it will be selectable on a per-transaction basis
6. Data Link Layer Properties
Although there are many similarities, the data link layer varies considerably among PCI-Ex, HT, and RIO. Although the layered structure of the flexible interconnect hides many details, certain aspects of the data link layer may affect the design of a flexible internal interface between transaction layer and the application.
For example, the maximum packet size at the data link layer impacts the transaction layer as well. The designer of a flexible interface must allow the maximum packet size to vary, as this value is different in each specification. However, a reasonable supported range of 64 to 512 bytes will decrease buffering costs and still give good p erformance.
Error protection and link control mechanisms also vary per specification, but the result that reaches a flexible internal interface is an array of fatal or non-fatal errors that must be handled. That array includes things like an errorless response and an indication that the link has gone down.
Flow control mechanisms are different among these buses but generally do not affect the internal bus interface. For RIO in particular, it is recommended to not use the less-efficient Rx controlled version of RIO flow control, but instead to use the Tx-controlled flow control option. This option is similar to the mechanisms in PCI-Ex and HT.
Physical Interface Comparison
At the PHY layer, there are enough similarities between the two serial interconnects (PCI-Ex and Serial RIO), and between the two parallel interconnects (HT and Parallel RIO), that a shared PHY design is feasible in either case, although some aspects particular to the interconnects will still be required.
Below, we'll look at the issues involved with developing a flexible PHY for both the serial and parallel interconnects. Let's start by looking at the serial PHY.
1. Flexible Serial PHY
The serial interconnects are designed for low pin count and flexible topologies capable of spanning backplanes as well as chip-to-chip. The key PHY layer attributes of PCI-Ex and RIO are shown in Table 7. For each attribute, a practical value or range is suggested. Note that the PCI-Ex PHY introduces several features targeted at the PC space, like power management and plug and play features. We argue that such features are not required for flexible serial PHYs that target telecom/datacom applications.
When looking at building the flexible serial PHY, however, the designer has to deal with some key desig n differences between the serial RIO and PCI-Ex specs. These include scrambling, the use of control characters, line speeds and widths, signal swing, pre-emphasis, clicking, jitter, power management, and plug-and-play support. Let's look at each in more detail
For EMI reduction, PCI-Ex uses scrambling prior to the 8B/10B coding layer. RIO, by contrast, uses the XAUI-like randomization of the idle data pattern to achieve the same objective. Unfortunately, these techniques are completely different and will require independent implementation.
8B/10B Coding and Use of Control (K) Characters
PCI-Ex and RIO use 8B/10B coded links. The 8B/10B coding ensures sufficient bit transition density to recover the data clock; hence no separate clock signal is used.
In the serial 8B/10B interfaces, special “K characters” are used for link training and maintenance, bit and byte alignment, multi-lane deskew, clock compensation, packet delimiting, and other purposes. The particular characters used differ. Examples of different uses of the K characters are:
- The K28.5 character, which contains a “comma” sequence, is used by both PCI-Ex and RIO for bit and byte synchronization. However in PCI-Ex a K28.5 column is also used for lane alignment, whereas RIO uses K27.7 for lane alignment.
- The K28.0 character is used for SKP (clock compensation) purposes in PCI-Ex. RIO, however, uses the K29.7 character for similar purposes.
It becomes clear that a required capability of the flexible serial PHY is the programmable treatment of the K character set. Fortunately, this is already available in some programmable serdes chips.
Link Speeds and Widths
PCI-Ex, in its current version, uses a 2.5-Gbit/s signaling rate. Future versions will likely support additional higher speeds.
RIO currently supports signaling at 1.25, 2.5, and 3.125 Gbit/s, and will also likely embrace higher speed serdes in future. The key electrical parameters of RIO at 3.125 Gbbit/s are similar to XAUI (also 3.125 Gbit/s), while the lower RIO speeds are essentially baud-scaled versions of XAUI.
Rate agile serdes technology spanning the 1.25- to 3.125-Gbit/s range, hence all current PCI-Ex and RIO speeds, is a key aspect of the flexible serial PHY.
A wide range of lane widths is available in PCI-Ex, while RIO constrains the lane widths to x1 and x4. An x1 and x4 -capable PHY is likely to find wide application in the telecom/datacom space for both these interconnects.
Signal Swing and Pre-Emphasis
The signal swing levels of PCI-Ex and serial RIO are somewhat different. Both ranges, however, can be implemented with programmable serdes technology (typically implemented in current mode logic [CML]).
In its current version, PCI-Ex uses a form of pre-emphasis in which transition bits are given higher amplitude than following bits. Pre-emphasis is optional for RIO. However, in the backplane application of either interfaces, pre-emphasis is of clear benefit and its support is recommended. In chip-to-chip applications pre-emphasis may not be beneficial. It is therefore recommended that the flexible PHY solution make the use, and degree, of pre-emphasis configurable.
Clocking and Jitter
The allowed transmit jitter and required receive jitter tolerance are quite similar between PCI-Ex and RIO. The overlapping range of 0.3UI (Tx) and 0.65UI (Rx) contributes to interoperation at the PHY layer.
The clock tolerance specified by PCI-Ex is +/- 300 ppm; the corresponding requirement in serial RIO is +/- 100 ppm. The PCI-Ex requirement aligns with the common use of spread spectrum clocking in the PC industry. For telecom/datacom applications, the +/-100 ppm range without spread spectrum clocking should suffice.
Power Management and Plug-and-Play Support
In communication applications, power management support in PCI-Ex forms a set of capabilities that allow the PHY to be operated in lower power standby modes when the link is idle. Furth ermore, plug-and-play requires that a transmitter detect the presence of a powered receiver. These application requirements peculiar to the PC industry produce the resulting PHY requirements that are not typically supported communications-industry serdes:
- BEACON signal used by a transmitter to “wake” a “sleeping” receiver
- “Receiver detect” signal used by a transmitter to detect a powered receiver
- Requirement for receiver operation at a DC common mode voltage of 0V to facilitate receiver detect
- End-of Idle detection by the receiver to detect transition out of low power states
- Active state power management (ASPM) which automatically moves a link between active and power down states
While a fully PCI-Ex compliant flexible PHY solution would require all these capabilities, applications of PCI-Ex for embedded telecom/datacom wherein these capabilities are not needed may become common. While too early to state with certainty, we feel that an appropria te flexible serial PHY solution for the telecom/datacom industry does not require these features. Without such features, PCI-Ex could use industry standard serdes technology at 2.5 Gbit/sjust like serial RIO.
2. Flexible PHY for the Parallel Interconnects
The parallel interconnects, HT and parallel RIO, are optimized for low-latency, high bandwidth chip-to-chip interface applications. They both use a similar source-synchronous interface, in which scaleable data widths are accompanied by separate clock signals and a small number of out-of-band control signals. The main electrical attributes are shown below. From the electrical point of view, designed-in flexibility for both interfaces is quite feasible.
Electrically, both interfaces use a signaling scheme based upon LVDS. HT, however, uses a modified swing level and common mode voltage range. There is a wide enough overlap region between signal swings for the two interfaces, such that the standard LVDS swing (used in RIO) should work fine for both.
Finding a suitable overlap for the common mode is more difficult. HT uses the lower common mode voltage for optimal operation from a 1.2 V rail. It is not desirable to move to the higher rail required for RIO common mode for reasons of power dissipation and use of lower voltage devices. One potential solution is to run the RIO driver at the HT common mode driver voltage. This should work fine as the RIO (LVDS) receiver tolerates a wide common mode voltage range to accommodate large ground differences between transmitter and receiver. These ground differences are usually negligible in the chip-to-chip applications for which parallel RIO and HT are intended.
HT allows an auto-negotiated data path at the widths shown in Table 8 above. Typically, HT and parallel RIO are used as an interconnect to high speed embedded processors where an 8-bit or 16-bit data path is used. For these app lications the other HT data widths may not be required. In both HT and parallel RIO, the 8-bit and 16-bit data-widths require 1 and 2 clock signals, respectively.
HT and RIO each require an out-of-band control signal to help distinguish between control and data sequences. A flexible PHY should support this signal; the logical control of the pin will need to be programmable. Finally, HT uses 4 single-ended pins for support of reset and power management functions, which should be supported.
This article has review the characteristics of 5 interfaces, PCI, PCI-X, PCI-Express, HyperTransport, and Rapid I/O. These interfaces achieve much the same results for embedded applications in markedly different ways. By identifying common interface characteristics, we have provided specific recommendations for the design of a flexible bus interconnect that enables future proofing and eases interworking among these interfaces.
About the Authors
Duncan Bees is a technic al advisor in PMC-Sierra's Product Research Department organization. He graduated with a bachelor of applied science in electrical engineering from the University of British Columbia and a a master of engineering (Communications Signal Processing) from McGill University. Duncan can be reached at firstname.lastname@example.org
Brian Holden is a principal engineer in PMC-Sierra's Microprocessor Products Division. Brian graduated with a Bachelor of Science in Electrical Engineering from the University of California, Davis and received a Master in Business Administration from Cornell University. He can be reached at email@example.com.