Combining hard code speed with software flexibility

Combining hard code speed with software flexibility
By Balakumar Velmurugan, Technical Marketing Manager, Azanda Network Devices Inc., Sunnyvale, Calif., EE Times
November 5, 2002 (10:39 a.m. EST)
URL: http://www.eetimes.com/story/OEG20021101S0061

In the design of our traffic management coprocessor, which uses OC-192 techniques to satisfy the requirements of the lower performance but multiservice OC-48 market, one of our key challenges was to provide both performance and flexibility to meet the wider range of needs of router and switch makers there.

The effort needed to program a device is proportional to the complexity of the device itself. The functionality of today's routers and switches extends far beyond simply forwarding packets.

They offer a variety of sophisticated services and execute complex protocols. These emerging network services and internetworking protocols cannot be implemented solely in hardware or software. OEMs need a complete solution that provides both the performance advantages of hardware and the flexibility of the software-based network processor unit (NPU) solutions.

Lopsided balancing of these complex tasks between software and hardware may no t only be inefficient, but often adds time and cost to the design cycle. These network devices, whether ASSPs (application-specific standard products), network processors, or custom ASICs (application-specific integrated circuits), should provide a programming model that maximizes performance and offers flexibility, while reducing development time and cost.

Different vendors have taken different approaches to providing a complete solution. One approach, taken by generalized network processor vendors, pushes all processing resource management complexities onto the software. This approach offers "infinite" flexibility at the cost of performance and programming complexity.

Another approach, taken by ASSP vendors, offers deterministic performance. Any predefined feature or set of features (implemented in hardware) may be selected and configured. Modifications to the feature definitions are not allowed, however, which restricts flexibility.

The programming models in both these cases are very different, and each requires completely different levels of development effort. For example, implementing a complex algorithm such as Traffic Policer using an NPU-based approach could take significant development time to design the microcode and model the performance after successfully allocating the processing resources through a lengthy validation effort. Configuring the Traffic Policer feature in an ASSP, on the other hand, could simply entail programming a few registers. The first approach would provide the flexibility to make in-field or developmental modifications, while the second would not. Similarly, the ASSP-based solution is easier to configure and would provide deterministic performance guarantees as compared to the NPU-based software solution.

We decided to take an alternative approach while designing the traffic management features in the development of our chip. Instead of providing the programming flexibility for undefined features, as is the case of network processors, or provi ding a simple set of configurable options to control a hard-coded feature, as in the case of ASICs, we tied our programmability to loosely defined features implemented in hardware through dedicated state machines. This approach provides the control and the flexibility OEMs need to customize a feature without compromising performance or lengthening the time-to-market cycle.

A good example of the dedicated state machine approach can be illustrated by our implementation of the traffic-policing feature common to all traffic management devices. Businesses typically contract with service providers for a specified bandwidth. Policing ensures that customers are receiving the bandwidth they are paying for, but no more. For example, if a customer purchases a 10-Mbit/second service from a service provider, and the customer sends 15 Mbits/s of traffic to the service provider, the service provider may discard 5 Mbits/s of tra ffic as non-compliant.

Leaky buckets

One mechanism for enforcing this is through a Dual Leaky Bucket algorithm, which provides conformance checking. The algorithm checks data packet flows against the set of parameters specified in the traffic contract, or service level agreement (SLA). To understand how this algorithm works, visualize a bucket with a hole in the bottom. Water can pour into the bucket at any rate, but can only exit through the hole at a constant rate. If water entering exceeds water exiting, the excess overflows the bucket.

In its simplest form, a Leaky Bucket algorithm monitors the incoming cell flow and discards the overflow cells. More sophisticated algorithms go several steps farther to ensure that higher priority packets are less likely to be dropped than lower priority data. In a typical ATM forum-defined Dual Leaky Bucket implementation (VBR.1), the compliant cells from the first bucket may "drop" into a second leaky bucket, where a subsequent decision is made whether to pass or drop cells.

Variations of this algorithm may pass all overflow cells from the first bucket into the second, or may mark individual overflow cells from the first bucket and determine whether to drop or pass them on to the second bucket based upon some priority ranking. Examples of these include Single or Two-Rate Three-Color Marker implementations as defined by RFC-2697 and RFC-2698.

Designers implementing this algorithm in hardware follow the specifications laid out by standards bodies, such as the ATM Forum. There are, however, variations of the policing functions for different networking technologies like ATM, Frame Relay, and Internet Protocol (IP), which are defined by other standards bodies such as the IETF and Frame Relay Forum or by the OEMs to support a newly defined SLA model.

Router and switch vendors may perceive advantage in choosing one standard or another, necessitating a new chip design to support every emerging standard. Some may even wish to tweak the standard algorithms. The problem is, once implemented in hardware, it cannot be modified. The alternative, a software implementation, may provide the desired flexibility, but at the expense of performance and extra development and validation effort.

In our approach, we modeled the exact behavior of the generalized leaky bucket in hardware in a way that allowed us to support both fixed-size cells and variable-sized packets, while providing the flexibility to program it to support various standard algorithms and technologies. Dedicated, programmable state machines determine how the dual buckets are connected to each other and what action should be taken when a packet becomes compliant or non-compliant.

For example, the standard ATM Forum and Internet Engineering Task Force (IETF) policing algorithms can both be supported if the leaky buckets are sufficiently generic in their implementation to support various protocol data units and if adequate programmability exists to co ntrol the connectivity between the buckets and actions to be taken on violating and conforming packets.

"Programming" our device to support a standard policing algorithm or customizing an existing standard algorithm is as simple as calling a well-defined API function. Our programming interface takes any policing behavior that can be represented in the truth table information — whether cell-based, packet-based, or some customized variation — and programs the state machine.

This not only provides the ability to support both standards-based and proprietary variations of the policing algorithms with easy, intuitive programming interfaces, it also provides the ability to achieve performance determinism. The API, written in the popular C language, can be easily integrated into any operating system.

Once the state machine is programmed to support a policing algorithm, the control plane software creates a policy instance by mapping a state machine instance with the traffic type and a set of traffic parameters. Incoming packets that reference a policing instance determine, at wire speed, whether they should be thrown away, marked, or passed on to the next bucket as defined by the state machine. Because the software programming of the state machine and the leaky buckets occurs in the control plane, packet-forwarding performance is not adversely impacted.

Industry Articles

Combining hard code speed with software flexibility