by Doug Chisholm, Cadence Design Foundry
This paper is a design case study of an IPsec (or Ethernet Security) application developed and tested using a platform-based approach. Bump-in-the-Wire is the name commonly given to a box that can be inserted between a user and the Internet to provide secure communications using IPsec.
IPsec is an extension of the TCP/IP protocol that involves cryptographic functions such as DES encryption and SHA-1 hashing. The other features of IPsec such as key exchange (i.e. Rivest-Shamir-Adelman or Diffe-Hellman), tunnel and transport modes, Encapsulating Security Payload (ESP) or Authentication Header (AH) modes are briefly described as an introduction to the topic.
The computational complexity of DES and SHA-1 algorithms and the partitioning between software and hardware are discussed.
System design issues such as packet latency, bandwidth and packet buffering are explored and an explanation of the architectural choices is provided. Finally, the benefits of using a platform-based approach for IP verification and also hardware and software co-design are reported.
Figure 1. Virtual Private Network
The key components of the platform are the embedded application software, the embedded ARM processor and peripherals, a proprietary IPsec co-processor and Ethernet 802.3 Media Access Controllers. The platform demonstrates proof-of-concept and inter-operability as well as facilitating easy migration to a SoC solution.
Virtual Private Networks (VPN)
The growth of the Internet has led to many new opportunities for e-commerce and telecommuting but also considerable security challenges. Users and applications need to communicate over the Internet with the same security as if they were connected on their own private LAN. This requirement has led to development of Virtual Private Networks (VPN) as shown in Figure 1 below.
A VPN must be able to authenticate the identity of users, ensure data integrity (or recognise that it has been tampered with) and provide confidentiality by means of encryption (such that other users cannot read the data even though due to the nature of IP networks there will be many recipients of the data).
A VPN is implemented using the Internet Security Protocol (IPsec) which is an open standard extension to the TCP/IP stack specified by the Internet Engineering Task Force (ITEF) in various Requests for Comments (RFC) [Reference. 1]. IPsec specifies how cryptography is used to authenticate users, encrypt their data and guarantee data integrity. These cryptographic operations are computationally intensive but clearly the users of a VPN do not want their communications to be compromised in terms of latency or bandwidth. Therefore IPsec cryptography is ideally suited for hardware acceleration and subsequently there is a growing market of sophisticated cryptographic gateways, firewalls and network interface cards. This paper describes an Ethernet based IPsec platform targeted at that Internet security design space.
Public Key Cryptography
The authentication of corresponding peers is based upon the Diffie-Hellman (DH) or Rivest-Shamir- Adleman (RSA) public key (a.k.a. asymmetric) cryptography. In these schemes every user in the network can be considered as having a private and a public key. The public and private keys are mathematically related and a message enciphered using a public key can only be deciphered using the associated private key.
The principles of public key cryptography are illustrated below.
For example Bob can send a message to Alice by encrypting the message using Alice's public key. Only Alice can decrypt this message even if other users receive it.
Figure 2. Public Key Cryptography
Authentication allows Alice to be certain that Bob sent the message. To authenticate the message Bob must hash his plain text message and then encrypt this hash using his private key. Bob then sends his privately encrypted hash and publicly encrypted message to Alice.
On receipt, Alice can then decrypt the message using her private key and regenerate the message hash. She can also decrypt the encrypted hash using Bob's public key then compare the two hash values thus authenticating that Bob sent the message and confirming the integrity of the message.
Note that authentication is only performed when a session is established and hence is an infrequently executed task. However, authentication must be performed rapidly so that the user does not experience any undue latency. In fact in the BITW system described below authentication was not implemented and secure session keys were hard wired so as to minimise the software development.
Figure 3. BITW Demonstration System
The principal components of the Bump-in-the-Wire (BITW) demonstration system shown above are two PCs each running the BITW TCP/IP client/server application, a PC running a protocol analyser (representing a snooper), an Ethernet hub (representing the entire Internet) and two BITW platforms each with two Ethernet ports.
A BITW platform can receive 10/100 Ethernet frames from the user port and then forward the encrypted frames to the network port. Similarly it can receive encrypted frames from the network port and forward the decrypted frames to the user port. The BITW is a store-and-forward architecture where the system latency depends on how long it takes to perform the data processing (i.e. cryptography) on the data traffic.
IPsec can operate in several modes all of which require that the header and trailer of TCP/IP packets are modified. By default the BITW platform uses Encapsulated Security Payload (ESP) and transport mode. In these modes the payload, TCP header and padding are encrypted and a Hashed Authentication Message Certificate (HMAC) is generated using the SHA-1 or MD5 algorithm. Authentication Header mode is similar but the payload is not encrypted. In tunnel mode the entire original packet is encrypted and authenticated and a new IP packet is generated.
This protocol management function is best realised in software since the task need only be undertaken when a packet is received which is at a relatively low rate compared with the data rate.
The packet rate can be estimated by considering that 95% of all Internet traffic is TCP packets. Further 40% of all TCP packets are only 40-bytes long as they are only ACK, RST or FIN packets. Ethernet frames are approximately 1500-bytes and typical packet length can be assumed to be 512-bytes [Reference 2]. This implies that the maximum theoretical packet rate at 100Mbps is about 25,000 packets-per-second. This actual figure would be much less in a real system. Also due to the nature of IP, packets can be discarded if buffers overflow or the system becomes overloaded. However this cursory analysis illustrates that packet and network processing is a hot design topic. The target packet rate defines the maximum latency that is available in order to ingress and egress a single packet (i.e. 40 micro-seconds).
As well as the protocol packet management in the common modes of operation the data payload must be encrypted using DES and hashed SHA-1. These algorithms should also be executed at line speed (i.e. 100Mbps) in order to avoid excessive buffering and latencies. The computational complexity of DES is discussed later in this paper.
In conclusion an IPsec system requires both hardware and embedded software and so lends itself to a platform-based approach.
The BITW platform is based upon an ARM Integrator board. The FPGA on the associated Logic Module was used to extend the basic Integrator AMBA bus micro-architecture such that it contained three AHB masters. That is the two instances of the Ethernet 10/100 MAC (with additional RMII blocks) and one IPsec co-processor. The MEXP and RNG IP blocks are optional for future expansion to support accelerated authentication. Note that all of the blocks are Cadence AMBA compliant IP.
The first major benefit of this platform-based approach was that minimal RTL design was required and none of the AMBA compatible IP required any modification.
A proprietary RMII PHY daughter board was used to interface to the Ethernet RJ45 connections and hence the platform demonstrated the physical interoperability between the MAC and PHY layers This is a fundamental advantage over co-simulation design methodology and is the second benefit of adopting a platform based approach. A block diagram of the BITW platform is shown in Figure 5 below.
Figure 4. BITW Platform Micro-Architecture
Data Encryption Standard (DES)
As mentioned earlier IPsec uses the Data Encryption Standard (DES) to encrypt and decrypt the actual data payload. DES is known as a symmetric algorithm because it uses the same key to encrypt and decrypt the data. That key is calculated during the authentication process.
The DES algorithm requires 16 rounds of computation to convert a plain text block of 64-bits into a cipher text block of 64-bits or vice-versa. To encrypt or decrypt a larger amount of data DES is configured to operate in a mode such as Cipher Block Chaining (CBC) where the cipher block is fed back and combined with the next input block. For increased security the DES can be applied three times and is known as triple DES (TDES).
Since DES operates on the data payload it must process a maximum of 100 Mbps in each direction. Since the BITW platform requires two Ethernet ports the total data bandwidth is 400Mbps.
A typical micro-processor running at 40MHz is able to perform about 1 to 3 Mbps [Reference 3]. Motorola's recent adverts for their S1 security processors claim that a software solution would deliver 3.6Mbps. So line speed encryption is clearly a candidate for hardware acceleration.
The TDES IP used in the BITW platform is pipelined such that two rounds are performed per clock cycle and hence a single TDES operation requires 24 cycles. For 40MHz operation the maximum bit rate is therefore 106Mbps or approximately a 50X improvement over a software solution. Although not enough for the maximum theoretical bit rate it is ample for the BITW demonstration.
Similar analysis shows that SHA-1 also benefits from being implemented in hardware and the IP used achieves 252Mbps at 40MHz operation.
The BITW platform and demonstration requires three software components:
1. The TCP/IP application with GUI that runs on each user PC
2. The embedded ipsec application running on the BITW platform
3. The protocol analyzer running on the snooper PC
Figure 5. BITW PC Application GUI
The TCP/IP application performs two tasks; it provides the graphical user interface (GUI) for user messaging and creates a TCP/IP client-server connection between the PCs. Users can exchange text messages in clear or encrypted form and hence generate real-time data traffic on the "Internet".
The user message is entered and displayed on one PC. The message is then sent to the other PC via the two BITW platforms and received by the other PC. The original clear text message is then displayed on the destination PC even if encryption is turned on. The GUI for the BITW application is shown in figure 5..
The BITW application code initialises the system and performs the protocol and buffer management as frames are received and transmitted in real-time. It re-uses the MAC and IPsec C-based drivers that were available with their respective IP blocks.
The BITW application must examine and buffer each incoming packet. If it is received from the user port then it is copied across for transmission to the network port. If encryption is enabled then the packet is encrypted and a new header is created. Similarly packets received from the network port are examined and if necessary decrypted and transmitted with a new header via the user port. Hence the BITW is transparent to the two users.
The snooper can capture clear and encrypted packets but only the clear packets are human readable thus demonstrating the effectiveness of the IPsec protocol.
The snooper PC runs a public domain protocol analyser that can analyse and display all the traffic that passes through the Ethernet hub (which is representative of the entire Internet). The protocol analyzer was customised so as to capture and display the clear or encrypted frames.
The software development tools used were: the ARM Development Tools, Metrowerks Code Warrior IDE, AXD (ARM Debugger), ARM Multi-ICE and Microsoft C++.
Time Complexity of Verification
The third major benefit of a platform-based approach is the massive reduction of the run-time complexity of verification. The National Institute of Standards and Technology (NIST) certify TDES IP by means of golden tests.
The TDES test suite consists of 40 Known Answer Tests (KATs) and 36 Modes tests (each of about 4 million TDES operations or 100 million clock cycles). The results of the Modes tests are not provided to the IP developer who must generate the results and return the log files to the NIST laboratory in order to gain certification. The RTL simulation run-time of the TDES test suite is approximately 140 days on a single Sun Ultra Enterprise 420R. In fact this period is longer than the development time of the TDES IP block and is clearly unacceptable. Using hardware emulation the run-time was reduced to one hour, a significant reduction in time-to-market and better use of precious design resources.
The same methodology, and platform, was used to gain certification of the SHA-1 IP. In general platform-based design is to be recommended for computationally intensive IP verification.
Hardware Software Partitioning
The table below summarises the BITW system functions, their complexity and their suitability for either hardware or software implementation.
Table 1. Hardware/Software Partitioning
The protocol management software must process in the order of 10K packets per second, or roughly 10MIPS. The Cadence IPsec co-processor only transfers the data once in order to encrypt and hash so that the load on the AHB bus is only a maximum of 400Mbps for both operations. The two Ethernet 10/100 MACs also require a total of 400Mbps for full duplex operation. Note that the IP blocks all transfer data by DMA and that the bottleneck is essentially the shared memory resource bus. From this analysis it is confirmed that a single flat AHB bus clocked at 35MHz (theoretically 1120Mbps on a single 32-bit AHB) provides sufficient bandwidth for this system.
SOC Model and Simulation
The adoption of a platform-based approach does not preclude simulation. In fact the shared bus architecture allows the RTL IP to be simulated without modification. For this project the entire platform was modelled as a SoC including the embedded processor, peripherals, cryptographic IP and memory blocks.
Note that a derivative SoC solution can scale the bandwidth by increasing the clock frequency and or adding more instances of the IPsec IP block.
The BITW system like many other real-time communication systems can be readily designed and tested using a platform-based approach.
A platform-based approach can bring many benefits:
1. hardware and software co-design
2. maximum IP re-use
3. physical inter-operability
4. real-time operation
5. accelerated verification run-time
The adoption of a common bus architecture such as AMBA also allows platforms to be re-targeted for SoC implementation without major RTL modifications. The platform hardware can be simulated if necessary but the software development and integration are best performed on the actual emulation platform.
Doug Chisholm has 18 years semiconductor design experience and has previously worked for Wolfson Microelectronics, SGS-Thomson/INMOS and ARM. His current interests are processor design, embedded systems, VoIP and cryptography.
1. "RFC 2401" IETF.
2. "Voice Over IP", Uyless Black, Prentice Hall 1999.
3. "Applied Cryptography", Bruce Schneier, Wiley 1996.