Too many specs confuse server design

Too many specs confuse server design
By Tom Bradicich, CTO, and Bill Holland, senior technical staff member, xSeries, eServer Division, IBM Corp., Research Triangle Park, N.C., EE Times
October 29, 2002 (2:06 p.m. EST)
URL: http://www.eetimes.com/story/OEG20021024S0012

Today there seem to be more "industry-standard" technologies available than ever before for connecting together server subsystems, but many of these technologies overlap. For managers responsible for purchasing and deploying systems, it is important to understand each standard's capability, cost, time line, target market and industry backing to determine which ones deserve the company's investment.

Today's interconnects can be divided into three groups: chip-to-chip links, I/O expansion adapters and external server communications.

At the chip-to-chip level, it's worth noting that the integration of advanced functions-such as systems management or Ethernet-into the core chip sets is creating more functional differences among chip sets. Vendors need to develop more chip sets to deliver the right permutations of function, cost and performance. To ensure maximum flexibility in designing servers and to save money developing new chip sets, it would be valuable to have a standard chip set interconnect.

Such an interconnect is of minimal value to server designers unless a broad range of standards-based components is available from multiple vendors. However, if mix-and-match designs can produce better servers than just procuring a whole chip set, a standard could begin to displace proprietary bus solutions. The critical question is if and when chip set vendors will drop their own optimized bus designs in favor of a common standard.

Support base
PCI Express could be such a common standard. But unless other major chip set vendors jump on the bandwagon, the value of PCI Express as a chip-to-chip standard will be extremely limited.

At the slot level, it is vital to have a widely accepted industry standard to ensure the availability of the widest possible range of adapters. The key attributes of this I/O structure include performance, quantity of slots, breadth of available devices and, of course, return on investment.

Today, PCI is the unchallenged leader for connecting I/O adapters to servers. Its performance scales from 1 Gbit/second for 32-bit, 33-MHz PCI to 8 Gbits/s per bus segment with 64-bit, 133-MHz PCI-X. Essentially, PCI-X performance is more than enough for the highest-bandwidth adapters being deployed by customers today: 800 Mbits/s peak on a dual 2-Gbit Fibre Channel card.

In desktop systems, PCI is indeed running out of gas. Advanced graphics long ago moved from the PCI bus to the Accelerated Graphics Port. AGP has scaled through 2x, 4x and now 8x speed. Express is being positioned as the next step for desktop systems graphics after AGP 8x. While not significant to servers as a graphics slot, such a slot could provide an entry point for other functions. But it is unknown if and when other adapters will appear to fill this slot.

Speedup needed
Wi th 10-Gbit/s 4x Infiniband and early 10-Gbit Ethernet just coming to market, we now have individual adapters that could demand more bandwidth than PCI-X's 8 Gbits/s can deliver. PCI-X needs a 2x speedup to reasonably handle the full-duplex traffic capability of either 10-Gbit technology.

PCI-X 2.0 is set to deliver this needed 2x speed boost with a double-data-rate update to today's PCI-X chips. A second speed doubling, quad data rate, is also defined. The backward and forward compatibility among PCI, PCI-X and now PCI-X 2.0 is the key to migrating to higher-performance I/O. Vendors can seamlessly enhance their products with PCI-X 2.0 capability, while retaining support for PCI bus systems. Customers retain the flexibility to mix and match adapters within their servers.

This seamless migration to PCI-X 2.0 is in sharp contrast to the discontinuity that would occur in a move to Express. Without backward compatibility of the slot/adapter, Express will not easily replace PCI slots in servers. A dapter vendors would need to provide two separate product lines during the transition, and server vendors would have to provide multiple servers with different mixes of PCI and Express slots to satisfy customers in various stages of transition. Customers would, for the first time in 10 years, have to manage deployment of incompatible adapter types among their servers.

As for connectivity between systems, Infiniband and 10-Gbit Ethernet are two technologies promising much higher-speed communications. Infiniband has the functionality right now to deliver a cost-effective boost in performance, but 10-Gbit Ethernet will need to wait for Moore's Law and emerging TCP offload engines with remote direct memory access (RDMA) before it is in a position to benefit customers.

Infiniband host-channel adapters being released over the next few months deliver a low-latency interconnect for both database clusters and high-performance computing clusters by virtue of three protocols. Database applications alrea dy exploit the Virtual Interface Architecture protocol available on Infiniband. Many high-performance applications already use the Message Passing Interface that Infiniband also provides. Sockets Direct Protocol is a new protocol that provides a sockets-level software interface. Thus, it is able to support existing socket-based applications without the performance overhead of TCP/IP, and without having to recode applications to use one of the more exotic low-latency protocols.

In addition to low latency, Infiniband delivers high bandwidth up to the limits of the 8 Gbits/s of the PCI-X bus. The full 10-Gbit/s bandwidth potential of Infiniband will be realized when PCI-X 2.0-based systems emerge next year. Target channel adapters for bridging existing Ethernet and Fibre Channel networks to the Infiniband fabric will further expand Infiniband's value next year. But exploitation of Infiniband for I/O will grow more slowly than the use of Infiniband for clustering, because of the added complexity of mana ging multiprotocol networks.

Deployment impediments
Cost, backward compatibility and performance issues are impeding deployment of 10-Gbit Ethernet adapters. Another factor adding costs is the fact that 10-Gbit Ethernet requires optical interfaces. Gigabit Ethernet had similar cost factors limiting its acceptance, but the subsequent introduction of Gigabit Ethernet over copper wires greatly accelerated cost reductions. Just as important, gigabit copper interconnects delivered backward compatibility with existing 10- and 100-Mbit Ethernet products. At the moment, 10-Gbit Ethernet does not yet have a road map to accelerate cost reductions or to deliver a backward-compatible interface.

These issues would not hinder a customer desperate for higher bandwidth if 10-Gbit Ethernet were their only choice. Customers today could use multiple Gigabit Ethernet adapters on multiple PCI-X bus segments to deliver additional bandwidth, if the server could drive it. The bottleneck

isn't wir e speed, but rather the CPU overhead of executing the TCP/IP protocol stack and the memory-bandwidth overhead of copying data blocks from buffer to buffer. Today's servers typically get CPU-bound processing TCP/IP above 300 Mbits/s per CPU.

Networking vendors are implementing TCP offload engine (TOE) adapters that reduce the burden on the server's CPUs by shifting the protocol and packet processing to the adapter. Following this first wave of TOE adapters will be RDMA-enabled TOE adapters. RDMA avoids the buffer-to-buffer copy operations inherent in today's TCP/IP protocol processing by providing the application's destination buffer address along with the data itself.

Multiple PCI segments
So where does that leave us? Today, multiple PCI-X segments provide scalable I/O bandwidth, and Infiniband provides multivendor low-latency interconnect. PCI-X 2.0 is expected to double available bandwidth per adapter in 2003, and Infiniband will deliver 10 Gbits/s cost-effectively on PCI-X 2.0 slots in 2004.

First-generation 10-Gbit Ethernet network interface cards are not likely to be cost/performance competitive, but 10-Gbit Ethernet-plus-TOE cards could be of value in 2004 and 2005. Cards for 10-Gbit Ethernet with TOE and RDMA will be appealing as costs drop in 2006.

Looking further out, a 40-Gbit/s bus will be needed after 2006 for the next wave of I/O. Infiniband, at 12x (30-Gbit/s) rates, will emerge in 2006.

Industry Articles

Too many specs confuse server design