IPSec Suite

This document covers the IPSec protocol suite in detail. IPSec has been introduced to address the fact that the TCP/IP v4 protocol suite by desgin provides no effective security controls against a lot of imaginable security threats.


IP'sv4 original architects had no reason to provide security at the IP level. IP based network data is, therefore, wide open for tampering and eavesdropping. There are a few technologies now that exist to secure communication over the Internet. These technologies use powerful encryption technologies. However, most of them work at the application layer (PGP, S/MIME) or SSL, which work between the network- and transport layer. These technologies have their strengths and niches, but they are limited to specific uses.

There is a solution, and it isn’t restricted to a single application. Broadly speaking, you can imagine an IP based network as having four layers as following:

  • Physical / Link layer (Layer 1)
  • Network layer
  • Transport layer
  • Application layer


Each layer provides services to the level above. The significant feature of IP network is that the network layer in IP networks is entirely homogenous, and it’s the only layer that is. This means that any communication passing trough an IP network, including the Internet, has to use the IP protocol.
So, if you secure the IP layer, you secure the network.


Securing the IP layer with IPSec

An international working group under the IETF has developed a method doing exactly that. They call it the IP Security (IPSec) protocol suite. The IPSec protocol suite is based on powerful new encryption technologies, which add security services to the IP layer. It is compatible both with IPv4 and the new IPv6.
This means, if you use the IPSec suite where you normally would use IP, you secure all communications in your network for all application and for all users transparently.


Security threats in the network environment

To know you have security in your environment, you want to be confident about three things:

  • Authentication
    • that the person with whom you speaking really is that person
  • Confidentiality
    • that no one can eavesdrop on your communication)
  • Integrity
    • that the data has not been tampered with in any way during transmission

The architecture of a modern, IP based network makes all this difficult to ensure. So, the next thing is to discuss threats often used in IP based networks.



It is difficult in IP based networks to determine where a packet is really come from. A technique called spoofing takes advantage of this. To understand this, you need to know how information travels along a network. On the network layer, information is broken into small, manageable chunks of data called packets. Look at the figure below to see what such a packet contains.

ip packet


When two nodes on a IP network are communicating, the data stream between them is broken up into these packets and released into the network.

The difficulty with this from the security perspective is, that the source IP address in IP packet headers is easily changed. The attack – called spoofing - makes a packet coming from one machine appear to come from somewhere else altogether.


Session hijacking

If your TCP/IP based program trusts an source IP address to know that it’s really communicating with a server, nothing prevents someone from taking over the connection and cutting the link to the server. The Hijacker then takes the place of the server, exchanging data with the client without its knowledge.

The fact that you have identified the person with whom you are communicating once doesn’t mean that you can depend on IP to ensure that it will be the same person through the rest of the session. You need a scheme that authenticates the data’s source throughout the transmission.


Eavesdropping (LAN sniffing)

Today, Ethernet LAN’s are broadly used but it has the disadvantage of making sniffing easy. LAN sniffing is a very easy task if you don’t use a switched environment, means, in a network that uses hubs as connecting points, a packet is usually available for every node connected to that network. Conventionally, each node’s NIC only listens and responds to packets specifically addressed to it.

It’s is easy, however, to put a network card in something called “promiscuous mode” – in that mode the NIC collects every packet that passes the wire and send the packets up the IP-Stack. Usually, there is no way to detect such a NIC from elsewhere in the network. Special Sniffer programs such as Sniffer Pro from Network Associates or the Network Monitor from Microsoft give an administrator the ability to detect nodes running such a program in promiscuous mode elsewhere in the network.

A Sniffer collects all the data that passes the wire – allowing a user to detect quickly what’s going trough any segment of the network. In the hands of someone who wants to listen in on sensitive communication, a sniffer is a powerful eavesdropping tool.


Man-in-the-middle attack

To use encryption, you first have to exchange encryption keys. Once you have encrypted data with a key, you need the same key to decrypt it.

But exchanging unprotected keys through the network could easily defeat the whole purpose, since those keys could be intercepted and open up yet another type of attack – the man-in-the-middle attack.

A penetrator could plant his own key in the process very early, so that, while you believed you were communicating with one party’s key, you would actually communicating with a key know to the man-in-the-middle. IPSec offers various mechanisms to defeat the above described attacks.


Interlocking technologies from IPSec

Ok, let’s now starting with a closer look at the IPSec protocol suite. IPSec offers three interlocking technologies to defeat against traditional threats to IP-based networks.

  • Authentication Header (AH)

Ties data in each packet to a signature that can be verified from the recipient. The AH allows you to verify both the identity of the person sending the data and that the data has not been altered.

  • Encapsulation Payload (EP)

Scrambles the data with encryption so that a sniffer somewhere on the network doesn’t get something usable.

  • Internet Key Exchange (IKE)

A powerful, flexible negotiation protocol that allows users to agree on encryption and authentication methods, keys to use, hash algorithms to use and how long to use the keys before changing them and so on. The basic components of IPSec, the ESP and AH use cryptographic techniques for ensuring data confidentiality and digital signatures for authenticating the data’s source.How IPSec embeds encryption in the ESP

How IPSec embeds encryption in the ESP

IPSec handles the encryption at the packet level. The protocol it uses is called ESP. ESP also offers some Authentication as in the AH but the ESP authentication doesn’t authenticate the new IP header in front of the packet – in contrast to AH. The ESP (see below) follows the standard IP header in a IP datagram. It contains all the data and the higher level protocols relying on IP for routing. The figure below does not show the IP header.

embed encryption in the esp


The ESP contains six parts. The first two parts are not encrypted but are authenticated.

  • Security Parameter Index (SPI) is an arbitrary 32bit number that specifies to the device receiving the packet what security parameters the sender is using for communication. The SPI points to the Security Association (SA) which defines the IPSec parameters for a session (which algorithms, which keys and how long those keys are valid
  • The Sequence Number is a counter that increases each time a packet is sent to the same destination using the same IPSec parameters (SPI). It is an indicator which packet is which and how many packets have been sent with the same group of parameters (SPI). The sequence number provides protection against replay attacks, in which an attacker copies a packet and send it out of order, to confuse communication nodes
  • The remaining parts of the ESP packet are all encrypted during transmission (with the exception of the authentication data).
  • The Payload data is the actual data being carried by the packet
  • The Padding allows for the fact that certain algorithms require the data to be a multiple of certain numbers of bytes
  • The Pad length defines how much of the data is padding as opposed to the data
  • The next header field is like a normal IP next header field - it specifies the type of data carried and the protocol above


For the authentication field, see "Authentication within ESP"

The ESP header is added after a standard IP header. The IP header specifies in the next header field with a certain number that an ESP header will follow (instead of a TCP header). Every standard IP router can forward this packet because it uses a standard IP header.


Encryption algorithms supported by ESP

ESP can support any number of encryption algorithms. It is up to the user which algorithm to use. You can even use different protocols for each party with whom you are communicating.


Tunneling with the ESP

Tunneling takes the original IP packet (with header) and encapsulates it in the payload of the ESP. It then adds to the packet a new IP header with a destination address of a VPN gateway.

Tunneling allows you to pass illegal IP addresses trough a public network (the Internet for example).

Note: Keep in mind that the tunneling mode generates more overhead as opposed to the transport mode because transport mode doesn’t need a new IP header. Transport mode, therefore, cannot be used with private IP addresses over a public network.

Another advantage of tunneling is, that the original source and destination addresses are hidden from users on the public network. This defeats or at least reduces the power of traffic analyses attacks. An attacker doing traffic analyses doesn’t know what is being said, but does know how much is being said.


Authentication within the ESP

The authentication field in the ESP contains something called an ICV (Integrity Check Value). In other words, this is a digital signature computed over the ESP minus the authentication field itself. It may also be omitted entirely if the authentication field isn’t checked for use or if you want to use the Authentication Header (AH).

The ICV is calculated on the ESP once encryption is complete. The source encrypts a hash of the data payload and attaches this value as the authentication data. The recipient confirms that there has been no tampering and that the payload did come from the expected person.


AH - Authentication Header

AH provides authentication without confidentiality. The IPSec suite’s second protocol, the AH, provides authentication services but does not address confidentiality. The AH may be applied alone, in concert with the ESP or in a nested fashion when using tunnel mode.
Authentication provided by the AH differs from that provided by the ESP. ESP’s authentication service does not protect the IP header that precedes the ESP. The AH protects the external IP header, along with the entire content of the ESP packet.

Note: The AH does not protect all of the fields in the external IP header because some of them change while in transit – AH is designed to work around these fields.
In the packet, the AH goes after the IP header but before the ESP (if present) or higher level protocol (TCP or UDP, in the absence of ESP).

authentication header

The AH parts are as follows:

  • The Next header field indicates what the higher level protocol following the AH is (ESP or TCP, for example)
  • The Payload length field is a 8 bit field specifying the size of the AH
  • The Reserved field – says it all
  • The SPI, as in the ESP, specifies the set of security parameters (algorithms, keys and how long to use that keys) for that connection. In other words, the SPI is a pointer to a SA
  • The Sequence number, as in the ESP, increases with every packet that is sent to the same destination with a given SPI. It is for the purpose of keeping track of the order the packets go in, to make sure that the same set of parameters is not used for too many packets, and finally, to defeat against replay attacks
  • The Authentication data is the actual ICV (Integrity Check Value), or digital signature for the packet. It is much the same as the ICV used in the ESP. This field may include padding to bring the length of the header to an integral multiple of 32 bits in IPv4 or 64 bits in IPv6


Like the ESP, the AH can be used to implement tunneling mode. Also, as in the ESP, IPSec requires specific algorithms to be available for implementing in the AH. All IPSec implementations must support at least HMAC-MD5 and HMAC-SHA-1 for a minimum interoperability. HMAC is a symmetric authentication scheme supported by MD5 and SHA-1 hashes.


AH and ESP – two parts in the puzzle

You can name the AH and ESP to be as the building blocks of the IPSec suite. These two protocols provide such important services as authentication, confidentiality and integrity.All the services mentioned above rely on a secure key exchange. The encryption services provided by the ESP and AH represent a powerful technology to keep your data secret, for verifying its origin, and for protecting it from undetected tampering. But they mean little without a wisely designed infrastructure to support them, to distribute keys, and to negotiate security parameters between communicating parties.

So, the next part of this document discusses the key distribution service in IPSec, provided by the IKE protocol.


How many keys do you need?

Imagine twenty people need to exchange data on a secure way. They want to talk to each other on a secure and safely way trough the public network.
To get this job done, they use a symmetric encryption scheme - but unfortunately, they need 190 keys among them, and every user need to keep track of 19 keys. So, how to exchange this host of keys? For, example, they could distribute their keys on meetings specially initiated for that reason - but imagine how many meetings you will need – the management won’t be excited about so many meetings. Last but not least, to keep the encrypted data protected from tampering and brute force attacks, you need to change your symmetric encryption keys at least once a month.

Obviously, this is not practical. The first thing we can do in that situation is to change from symmetric to asymmetric encryption. With asymmetric encryption, each person issues a public key to all the others. Every user has his own private/public key pair – so every user needs to remember twenty keys (nineteen public, plus the own private key). Key exchanging in this example is much more easily compared to the first example – still complicated, but getting easier.

The above example illustrates two important things:

  • Key exchange is fundamentally a complicated process
  • Key exchange gets more complicated as the numbers of communicating players expands


So just because a system says it does encryption, that alone doesn’t mean that it is going to be appropriate for your needs. Any proposed VPN solution is only as good as it’s method of key distribution.

Key management and exchange

To communicate with someone using encryption and authentication services (like those provided by ESP and AH), you need to do three things:

  • Negotiate the protocols, encryption algorithms, and keys to use
  • Provide a way to exchange keys easily
  • Keep track of all the negotiated parameters


The Security Association (SA)

The first thing IPSec designers solved was actually number three, how to keep track of all the details, keys and encryption algorithms to use. They did it by bundling all together in something called Security Association (SA). An SA groups together all the things you need to know about how to communicate with someone securely. The SA, under IPSec specifies:

  • The mode of the authentication algorithm used in the AH and the keys to that authentication algorithm
  • The mode of the ESP encryption algorithm and the keys to that encryption algorithm
  • How you authenticate your communications (using what protocol, encryption algorithm and keys)
  • The presence and size of (or absence of) any cryptographic synchronization to be used in that encryption algorithm.
  • How often those keys must be changed (in order to keep the communication secure)
  • The lifetime of the SA itself
  • The authentication algorithm, mode, and transform for use in ESP plus the keys to be used by that algorithm
  • The SA source address
  • A sensitive level descriptor


An SA is describes all the security relevant aspects of a IPSec channel. It’s like a contract with whoever is at the other end. The SA also has the advantage that it lets you construct classes of security channels. If you need to be a little more careful talking to one party, the rules of your SA can reflect extra caution.


How the SA works with the SPI

The SPI is a number that uniquely identifies an SA. The SPI, together with the SA, makes keeping track of protocols and algorithms easy and automatic. The SPI is an arbitrary 32 bit number your system picks to represent that SA whenever someone negotiates an SA with you – it identifies the SA.

The SPI can not be encrypted in the packet because it is the pointer to the SA which defines how to decrypt a packet. When you negotiate SA, the recipient node assigns an SPI it isn’t already using and preferably one it hasn’t used in a while. It then communicates the SPI to the node with which it negotiated the SA. From then until the SA expires, whenever that node wishes to communicate with yours, it specifies that SPI.

Your node on receipt, looks at the SPI to determine the SA it needs to use. Then it authenticates and/or decrypts the packet according to the rules the SA specifies, using the agreed-upon keys and algorithms to verify that the data did really come from the node it claims, that the data has not been modified (tampered), and that nobody could have been reading the data. The next thing we need is to take a look onto how such a SA is negotiated.


IKE (Internet Key Exchange)

IKE is the IPSec group’s answer to strength key exchange and protocol negotiation. IKE integrates the Internet Security Association and Key Management Protocol (ISAKMP) and a subset of the Oakley key exchange protocol (Diffie-Hellman shared secret). IKE provides a way to:

  • Agree on protocols, algorithms, and keys to use
  • Authenticate the identity with whomever you would like to communicate
  • Manage those keys after they have been agreed upon
  • Exchange material for generating those keys safely


Key exchange is a closely related process to SA management. Before you can create an SA, you need to exchange keys – IKE wraps them both up together, and deliver them as an integrated package.


Manual key exchange

IPSec’s designers provided an other way to exchange keys. For minimal compatibility, every IPSec system must provide a way for manual keying as well. That means if you wish to use manual key exchange for certain situations, you still can. But IPSec’s designers also assume that in most situations, for large enterprise networks for example, this would be an impractical way.


IKE phases

IKE functions in two phases. In the first phase, two IKE peers establish a secure channel for doing IKE (the IKE SA). In the second phase, the two peers negotiate general purpose SA’s.


IKE modes

Oakley provides three modes of exchanging keying information and SA’s, two in phase one and one in phase two IKE exchanges.

  • Main mode accomplishes a phase one IKE key exchange by establishing a secure channel
  • Aggressive mode is another mode of accomplishing a phase one IKE key exchange. It is a little simpler and faster than main mode, but it does not provide identity protection for negotiating nodes. It uses three instead of six packets generally used in main mode IKE key exchange
  • Quick mode accomplishes a phase two exchange by negotiating an SA for general purpose communications


Establishing a secure channel for negotiation

To establish an IKE SA, the initiating node proposes six things:

  • Encryption algorithms
  • Hash algorithms (to reduce data for signing)
  • Authentication method
  • Information about a group over which to do a Diffie-Hellman exchange
  • A pseudo-random function (PRF) uses to hash certain values during the key exchange for verification purposes (this is optional, you can also just use the hash algorithm)
  • The type of protection to use


Perfect forward secrecy

An attacker has a number of opportunities to get hold of the encrypted data, when you pass them around the world over a public network. You can reduce the risk by using larger and larger keys. But the larger the keys, the more complex and slower the encryption and this can impair network performance.

A good compromise is to use reasonably large keys, and changing them quite often. So you need ways to generate new keys the person on the other end can agree on it as well. You must not use keying material from existing keys to generate your new keys. If you do so, and someone gets hold of your current key, he/she would be able to deduce your new key.

What you need is a method to generate a new key that is in no way dependent on the value of the current key. If then someone gets hold of your current key, it gives them only a small piece of the overall picture. Cryptographers call this concept “perfect forward secrecy” – IKE uses a scheme called Diffie-Hellman to achieve this.



A Diffie-Hellman exchange works like this. Every person who wants to use that scheme needs to generate a public/private key pair. Remember: The larger the keys the better the protection but the slower the encryption/decryption, and authentication process. Each of them sends their public key to the other (using an authentication scheme to prevent a “man in the middle” attack). Each of them then combines the public key just received with the own private key, using the Diffie-Hellman algorithm.
The result is the same on both sides – but no one else in the world can come up with the same value because the result is also dependent on the private key, which remains secret. As you can see, it is not only important to authenticate the public key of your opponent but to keep your private very, very safe. You should improve the security of your private key by protecting it with a password, or better, with a long passphrase.

You can use the derived key, which has the same value on both sides as either a session key or as a encryption key to encrypt a randomly generated key.
The very interesting point is: You use a public/private scheme to agree upon a IPSec shared secret (the Diffie-Hellman key – the session key) but you then use this session key in combination with a symmetric algorithm to protect your data. Guess you can already see the advantage – symmetric algorithms are much faster than asymmetric algorithms (factor 1000) – this solution has a big impact on network performance.

Don’t forget, you even need to authenticate Diffie-Hellman public keys to protect against the “man in the middle” attack. Diffie-Hellman itself does not solve this problem. There are several methods to protect public keys from an attacker. If the key exchange mechanism you use is protected by an authentication scheme, Diffie-Hellman allows you to generate new shared keys to use for symmetric encryption which are independent of older keys – providing perfect forward secrecy.




The pseudo-random function (PRF)

The PFR is just another name for a hash function. In IKE, you can use the PRF both for authentication purposes and to generate additional keying material (as a randomizer).

IPSec modes

Main mode

Main mode provides a mechanism for establishing a first phase IKE SA, which is used to negotiate future communication. The object here is to agree on enough things (authentication and encryption algorithm, hashes, and keys) to be able to communicate securely long enough to set up an SA for future communication.

  1. use Main mode to bootstrap an IKE SA
  2. use Quick mode within that IKE SA to negotiate a general SA
  3. use that SA to communicate from now on until it expires

All associated parties do a little bit exchanging work. In the first exchange, the parties agree on basic algorithms and hashes. In the second they exchange public keys for a Diffie-Hellman exchange and pass each other random numbers the other party must sign and return to prove their identity. In the third they verify those identities.

The parties then use the generated shared Diffie-Hellmann value in the three permutations, once they derive it. All parties have to hash it three times, generating first a derivation key (used for generating additional keys later in Quick mode), then an authentication key and finally the encryption key to be used for the IKE SA.


Aggressive mode

The Aggressive mode provides the same services as the Main mode. The difference to Main mode is that it is accomplished in two exchanges rather than in three, with only one round trip and a total of three packets rather than six.

Aggressive mode exchange attains the same goal as Main mode, except than Aggressive mode does not provide integrity protection for the communicating parties. The advantage of this mode is, however, speed.


Quick mode

Once the communicating parties have established an IKE SA using Main or Aggressive mode, they can use Quick mode.

Quick mode has two purposes: negotiating general IPSec services and generating fresh keying material. Quick mode is less complex than either Main or Aggressive mode. This is because it is using an already established secure channel and can therefore be less complex. Therefore Quick mode packets are always encrypted, and always start with a hash payload – it is used to authenticate the rest of the packet.

Key refreshing can be done in two ways. If you don’t need perfect forward secrecy, Quick mode can just refresh the keying material already generated (in Main or Aggressive mode) with additional hashing. The communicating parties can exchange nonce's trough the secure channel, and use these to hash the existing keys. If you do want perfect forward secrecy you can still request an additional Diffie-Hellman exchange trough the existing SA and exchange the keys that way.
Basic Quick mode is a three packet exchange, like Aggressive mode.


Negotiating the SA

After all those exchanges to generate the IKE SA, establishing the general purpose SA is relatively simple. To generate a new SA, the initiator sends a Quick mode message, protected by the IKE SA, requesting a new SA. A single SA negotiation actually results in two SA's – one inbound, to the initiator, and one outbound. Each IPSec SA is one way and the node on the receiving end of that SA always chooses its own SPI to ensure it is the only SA using that reference.

So, using Quick mode, the initiator tells the respondent which SPI to use in future communications with it, and the responded follows up with its own selected SPI.
Each SPI, in concert with the destination IP address and the protocol, uniquely identifies a single IPSec SA. However, these SA's are always formed in pairs (inbound and outbound), and these pairs have identical parameters (keys, authentication, encryption algorithms and hashes), apart from the SPI itself.


How do you know who is who?

How do you verify that people are who they say they are in the first place? The final component of the IPSec compliant secure VPN, in most implementations, is a Certificate Authority (CA). I already covered that in a previous article. So here, again, in short, how it goes.

A CA is a trusted third party, someone whose identity you can already prove, and who can vouch for the identity of people with whom you are trying to communicate. The CA is like a public figure who you know well enough to trust, and who vouches for other people. The CA software issues certificates tying three things together:

  • someone's identity
  • the public key that a person uses to sign their identity to online documents
  • the CA's public key (used to sign and authenticate the entire package)


The CA is the defense against the man-in-the-middle working his way into key exchanges. Whenever you initiate an exchange with someone, he has to sign it with his digital signature. You in turn can check that signature against the one on record with the CA. They have to match. You then check that certificate's signature with the CA's signature. They have to match too.

IKE provides for third party verification using the established industry-standard. X.509 format certificate.


Robust, scalable key exchange

Altogether, IPSec's IKE system offers network users a great deal that other schemes have had difficulty delivering. So, to put it all together, with an IPSec secure VPN:

  • you can form SA's with a large range of attributes, layering your network
  • you can build multiple domains of communication within the same secure VPN
  • you can update keys as frequently as you wish
  • you can do this in any IP network environment, of any size, for any size of enterprise
  • you can do this with anyone using IPSec technology