Advanced switching boosts PCI Express
By Pranav Mehta, Principal Engineer, Intel Communications Group, Phoenix, EE Times
October 28, 2002 (10:11 a.m. EST)
Data communications and telecommunications systems are converging, placing new requirements on the system building blocks that are at the heart of the communications infrastructure. This has placed demands on the technologies that interconnect those building blocks as well. The Advanced Switching specification for PCI Express aims to satisfy the interconnect requirements of those communications and embedded systems.
Interconnects must now support increased bandwidth at lower cost, constructs to enable redundancy and failover for high-availability systems, peer-to-peer data transfer capability, scalability in performance and number of nodes, and quality of support. The downturn has mandated another requisite-the ability to meet those requirements using an open-standards, modular system architecture to take advantage of economies of scale and to reduce cost.
Essentially, PCI Express emulates a virtual PCI bus within a switched topology by making the switches appear as if they are PCI-to-PCI bridges to the configuration and enumeration software. The specification also virtualizes PCI physical interrupts to the host by using in-band messaging within the context of the serial protocol. This enables boot device support without complex BIOS-level interrupt configuration. As a result, silicon makers are able to replace PCI bus front-end logic with a PCI Express front end, offering designers higher bandwidths without the need to change any software.
The specification also addresses data integrity issues by protecting packets with a 32-bit cyclic redundancy check at every node and providing robust error-handling mechanisms.
While the PCI Express base architecture resolves some of the issues communications systems vendors are facing, it still adheres to the host-centric architecture of PCI. Although very important in providing a smoother transition from PCI-based systems, the PCI tree topology is not conducive to a distributed-computin g architecture required by communications systems. The trend in those systems is toward complete disaggregation of processing elements and flat topologies to achieve higher performance and high availability through redundancy.
The Advanced Switching architecture introduces a network layer that builds on the tree-centric transaction layer of the base architecture. This networking or routing layer enables creation of an Advanced Switching fabric that provides true peer-to-peer communication among two or more nodes within the fabric.
The packet definition includes a routing header that contains all the information a switch fabric requires to route the packet. This includes the path-routing information, traffic class identification, congestion and deadlock avoidance information, packet size and protocol encapsulation information that indicates the protocol being tunneled through the encapsulation.
The layer can encapsulate a packet of any protocol. It assigns a header and efficiently routes it through the fabric. As the packet exits the fabric boundary, the header is removed and the receiving protocol stack can continue processing the packet in its native protocol format.
Among other components of the Advanced Switching architecture is a protocol encapsulation interface. PEI is the key field in the routing header that indicates what kind of payload is encapsulated within the Advanced Switching packet. The architecture specifies several mandatory PEIs, defines some optional PEIs like PCI Express base protocol and leaves room for future requirements.
Path routing relies on the geographical locations of nodes within the fabric. The node creating a request specifies the path a packet will traverse through the fabric before reaching the destination node. Each switching element in the packet's path is instructed to route the packet through the egress p ort that is physically at a certain offset from the ingress port. At the destination node, a simple path transformation operation creates the return path through the fabric for the response packet to get back to the requesting node. This procedure eliminates the need to have routing tables in the switches.
The multicasting feature allows a single packet generated by a source node to be sent to multiple destination nodes. The originating node specifies a multicast group identification code as part of the routing header, which is used to coordinate with the multicast table implemented in switches. The multicast table lists input and output ports participating in a multicast group. Based on the table entry, the packet is replicated on corresponding egress ports and enables flexible packet transmission.
The achitecture accommodates switches with various port counts and link widths. Fabrics, which can be single- or multistage, can range from two to thousands of ports. Given its encapsulation-base d architecture, it uses a lower-cost virtual-channel mechanism that requires fewer queue pairs per direction than present in the base PCI Express spec.