Tunneling SPI-4.2 Through an Advanced Switching Fabric
Gary Lee, Vitesse Semiconductor Corp.
Sep 09, 2004 (6:00 AM)
Figure 1: Diagram showing the SPI-4.2 frame format.
The SPI-4.2 interface has emerged as a one of the most popular physical interface for networking designs. Today it can be found on various products such as network processors, traffic managers, framers, media access control (MAC) devices, and switch fabrics.
Like SPI-4.2, the Advanced Switching Interconnect (ASI) is emerging as a standards-based switch fabric for the compute and communication industries. For ASI to gain early success in communication applications, there are plans to add SPI-4.2 as one of the protocols that can be tunneled through an ASI fabric.
In this article, we'll discuss several issues and approaches that can be taken when tunneling SPI-4.2 through an ASI design. In addition, performance trade-offs will be discussed for packet segmentation, packet routing, and flow control latency.
Before looking at how to tunnel SPI-4.2 through ASI, let's provide a quick overview of the SPI4.2 and ASI specs.
Developed by the Optical Internetworking Forum, SPI-4.2 is a standard protocol that is well known in the communications industry. It is a 16-bit wide LVDS interface with an additional status channel for flow control. The interface bandwidth can range from 10 Gbit/s for low overhead Sonet applications to 20 Gbit/s for applications such as switch fabrics that need bandwidth speedup in order to support cell overhead information.
Packets are sent through the SPI-4.2 interface as one or multiple data bursts with a payload control word header delineating the bursts as shown in Figure 1. The start of packet bit (S) and the end of packet status bits (EOPS) are used to identify a complete packet that may be made up of multiple bursts. The payload control word also identifies the SPI4.2 channel associated with the data burst using the ADR[7:0] bits. These bits are typically used to define a sub-channel within a device such as a framer or MAC physical port.
While SPI-4.2 is designed to interface chips on a linecard, ASI is a switch-fabric technology built on top of the PCI-Express physical and data-link layers. In ASI, virtual channels are used to separate traffic classes as they pass through the fabric. There are virtual channels identified for control traffic (BVCs), unicast traffic (OVCs) and multicast traffic (MVCs).
The ASI maximum packet size is defined throughout the fabric, and is typically smaller than the SPI-4.2 burst size. This may require the burst to be segmented at the fabric ingress and reassembled at the fabric egress.
Unicast traffic is routed through the fabric using path information contained in the ASI header. The header contains a turn pool and turn pointer which are used to route the ASI packet as shown in Figure 2. For each switch stage in the fabric, the turn pointer identifies the turn pool bit field associated the current switch. The value in the bit field specifies the port number difference between the ingress and egress switch element ports.
Figure 2: Diagram showing ASI turn pool routing.
In ASI, multicast traffic is routed through the fabric using multicast look-up tables. In each switch element, a multicast index value in the ASI header is used to identify a multicast mask.
Credit-based flow control is required at each device interface throughout the fabric. Credits are exchanged for each virtual channel number and type that exists on each side of the interface. If a virtual channel is congested on the receive side, it will stop granting credits to the transmit side until the congestion has subsided.
Status-based flow control is an optional feature in ASI which can eliminate blocking in single stage fabrics. It does this by reporting the fill status of the egress queues in a switch element to the fabric stage that is upstream from the switch element.
Since ASI typically supports internal payload sizes that are smaller than the minimum burst size used by an attached SPI-4.2 device, burst segmentation must happen at the fabric ingress and burst reassembly must happen at the fabric egress. To facilitate this, ASI provides a protocol interface (PI), called protocol interface 2 (PI-2), designed for segmentation and reassembly.
PI-2 defines an additional header to the standard ASI header that includes reassembly context identifier information. This information includes flow identifiers, sequence numbers and burst delineation information.
The ASI fabric endpoint devices that contain SPI-4.2 interfaces will perform this segmentation and reassembly (SAR) function. The ingress to the fabric will look at the payload control word address field (ADR[7:0]) and use the SPI-4.2 channel to identify the flow.
Each flow can be assigned a type (unicast or multicast) and traffic class which will be described in the next section. In order to support reassembly at the egress, the ingress must also add a sequence number to each segment before it is injected into the fabric in case different segments arrive out of order.
At the egress, each flow is assigned a different reassembly context based on its source, type, and class. The egress must reassemble each context in a separate queue before sending a burst out of the SPI-4.2 interface. The ASI definition of the PI-2 header provides a large number of reassembly contexts that can be used to individually identify each flow for proper reassembly.
In many designs, due to memory limitations, trade-offs must be made between the maximum SPI-4.2 burst size supported and the total number of re-assembly contexts. Limitations in the re-assembly contexts translate to limitations in the number of traffic sources, the number of traffic classes, or the number of types of traffic.
Routing Using the PCW Address Field
The SPI-4.2 payload control word contains an 8-bit address field that is used to identify the SPI-4.2 channel associated with the data burst. For example, an Ethernet application may include a multi-ported MAC. In this case, the address field may identify the MAC physical port.
In networking designs, the network processor may be able to dynamically change the value in the address field in order to pass routing information to the attached ASI fabric. In both cases, the 8-bit address field must act as a flow identifier. Keep in mind that arriving data bursts may be interleaved between different flows so each flow must maintain a unique address.
Although a protocol interface specification for SPI-4.2 has not been completed yet for ASI, there is a precedent for routing address based data through a path based fabric in the PI-8 specification, which defines how to tunnel PCI-Express packets through an ASI fabric. Here, a set of binding registers are used which are associated with different PCI address ranges. If a PCI-Express transaction arrives within a valid address range, it points to an ASI routing path in the binding registers, which is used to create an ASI header and route the transaction through the ASI fabric.
For SPI-4.2, the 8-bit address field could be used as an index into a 256-entry binding register table. The entries would be configured in-band using the ASI PI-4 fabric management packets. Each entry could include a routing path, a 3-bit traffic class field, a multicast bit and, if used, a multicast routing index. This allows each incoming SPI-4.2 channel to be routed to a different destination using a different VC type and priority.
Defining a Header
The ASI specifications define special PI headers for each protocol interface. The SPI-4.2 PI header can be very simple with just a few defined fields. One field will contain the start of packet and end of packet indicators, which are different than the start of burst and end of burst indicators used in the PI-2 header. Another field contains the destination SPI-4.2 channel identifier.
Since the ASI routing header will only direct the packet to the destination SPI-4.2 interface, this header field can be used to identify a specific destination SPI4.2 channel within that interface. This is another field that can be included as part of the binding registers.
The full SPI4.2 tunneling process could be defined as follows. An ingress burst arrives at an SPI-4.2 endpoint device. The payload control word address field is used as an index into a set of binding registers. The binding register contains information such as unicast verses multicast routing, a turn pool and turn pointer for unicast or an index for multicast, an 8-bit SPI4 channel number for the egress interface and a traffic class. This information is used to construct a unicast or multicast ASI routing header. If the burst size is larger than the ASI maximum payload size, the burst is segmented, and an ASI PI-2 header is added. Finally, the SPI4.2 PI header is added.
After being routed through the ASI fabric, the egress SPI-4.2 endpoint extracts the SPI-4.2 PI header and uses it to construct the payload control word for the egress SPI-4.2 data burst. These values include the egress SPI-4.2 channel number and the start of packet, end of packet indicators.
If re-assembly is required, the PI-2 header will be extracted and used to re-assemble the various data bursts in an egress queuing system. Each reassembly queue will be based on source, type, and class information.
Finally, the bursts are scheduled and transferred out through the SPI-4.2 interface. If a large amount of traffic is arriving from several sources to the same egress interface, the receiving SPI4.2 device may not be able to keep up with the bandwidth demand. This is where SPI-4.2 flow control becomes effective.
The SPI-4.2 specification defines a flow-control calendar sequence using the status channel. The sequence can be up to 256 states in length where each state defines one of the 256 possible SPI-4.2 channels. Each state contains two bits defined as Satisfied, Hungry and Starving conditions.
On the contrary to SPI-4.2, ASI defines flow control based on VC number and type and uses a single threshold value for each one. This means that each VC type and VC number must be reverse mapped using the binding registers to a particular calendar state. This can be done a number of ways and will most likely be left up to the implementer.
For example, if the fabric has become congested, an ingress fabric queue to a given destination serving a given traffic class may become full. This ingress queue is associated with a particular SPI-4.2 flow identifier (channel number) through the ingress routing table. This information can be used to set the correct flow-control bit in the calendar sequence.
Flow-control latency can become a critical factor depending on the depth and number of queues on each side of the SPI-4.2 interface. In the worst case, a flow-control update may only come every 256 states, which adds extra latency to the flow control response time.
In ASI, there can be a maximum of 8 unicast VC queues to each destination port and a maximum of 4 multicast queues. For example, a 16-port switch would then have 16*8+4 = 132 possible ingress queues associated with an SPI-4.2 interface. In this case the calendar sequence can be reduced to 132 states. Typical ASI switch elements may only support up to four unicast queues, reducing this example down further to only 68 calendar states. The system designer can make a tradeoff between the number of ingress flows and the flow-control latency.
Example IP Appliance Application
An IP appliance provides a good example application for SPI-4.2 tunneling across an ASI fabric. IP appliances typically consist of I/O cards and service cards as shown in Figure 3.
Figure 3: Block diagram of an example IP appliance.
In the appliance shown in Figure 3, the ASI switch elements are connected in a mesh arrangement, which may be a typical configuration for a smaller system. The I/O devices on the I/O card represent 10-Gbit framers or MACs that contain SPI-4.2 interfaces. Many 10-Gbit network processors also contain SPI-4.2 interfaces, but will be some of the first devices to employ direct ASI ports.
The CPU subsystems will typically have PCI-Express or ASI connections to the ASI fabric. Disaggregating the I/O functions from the service functions allows flexibility and allows trade-offs between I/O bandwidth and processing power.
In the example appliance, a typical packet may come in through a 10-Gbit Sonet framer on an I/O card. Based on the framer SPI-4.2 channel number, the packet will be encapsulated in a SPI-4.2 PI header and tunneled to one of several network processors for Layer 3 processing. The network processors identify packets that need further higher layer processing and send them to the CPUs. Other packets are sent to the Ethernet IO cards using the SPI4.2 PI.
The CPUs and NPUs can exchange data in their memory spaces directly using the ASI simple load store (SLS) protocol, which provides a low overhead remote direct memory access (RDMA) capability. After the packet transfer is complete, the CPUs typically perform Layer 4 to 7 processing, which includes functions such as virus scanning or spam filtering.
When the CPUs complete these functions, they can transfer the packet back to the network processors using SLS or form an SPI-4.2 PI header and send it to an Ethernet I/O card. This header also includes the egress physical port information. The processed packet is then forwarded to a destination in the enterprise.
Many communication devices containing SPI-4.2 interfaces will be available over the next several years. The ASI-SIG will soon define a protocol interface for tunneling SPI-4.2 bursts through an ASI fabric. This paper discussed several of the factors that must be considered when developing this protocol and defined a way that this could be done within the constraints of the ASI specification. In applications such as I/O appliances, this approach allows the dis-aggregation of I/O and processing power to save system cost while providing added flexibility.
- Advanced Switching Core Architecture Specification version 1.0. http://www.asi-sig.org.
- SPI-4.2 Implementation Agreement OIF-SPI4-02.1. http://www.oiforum.com/public/documents/OIF-SPI4-2.01.pdf.
About the Author
Gary Lee is director of intelligent switch fabric product management at Vitesse. Gary has a BSEE and MSEE from the University of Minnesota and can be reached at email@example.com.
Copyright © 2003 CMP Media, LLC | Privacy Statement