Next: Virtualizing the Network
Up: Title
Previous: History
When network address translation was invented it was a mere hack to circumvent IP shortage. Meanwhile it has proven to be useful in completely different fields nobody had thought of at the beginning, and there are probably many more useful applications that have not been found yet. In that context I want to try to explain the role NAT currently has and that it might gain in the future, proving that it is more than a short term solution and that it will stay with us for much longer, especially when we look at the current state of the IPv6 implementation. Experiments done by some people have shown that the IPv6 protocol itself does not cause many problems so migration could be swift, but lots of applications cause problems and it is therefore likely that IPv4 will be the major Internet- and Intranet-protocol for longer than expected.
Before we begin explaining NAT's role in todays and future networks I want to show in what different areas NAT is being used today. The explanations will be made from a technological point of view, i.e. I will not try to give advice on how a special kind of NAT should be used. The following sections are just an overview. The details that have to be thought of when implementing NAT or examining implications of using NAT are laid out in the chapter thereafter.
I have divided the overview into two parts. I call the first one classic NAT , meaning that this is the original NAT as invented in the early nineties which is covered by RFC 1631, mainly meant to save IP address space on the Internet. The second part introduces more recent forms of NAT-usage that do not serve the original purpose but opened up additional fields.
In the following sections m,n are defined as follows:
m: number of IPs that need to be translated (original IPs)
n: number of IPs available for translation (NAT IPs)
m:n-Translation, m,n>=1 and m=n (m,n in N)
With static address translation we can translate between IP networks that have the same size (contain the same number of IPs). A special case is when both networks contain just one IP, i.e. the netmask is 255.255.255.255. This NAT strategy is easy to implement, since the entire translation process can be written as one line containing a few simple logic transformations:
new-address = new-network OR (old-address AND (NOT netmask))
In addition, no information about the state of connections that are being translated needs to be kept, looking at each IP packet individually is sufficient. Connections from outside the network to inside hosts are no problem, they just appear to have a different IP than on the inside, so static NAT is (almost) completely transparent.
Example:
Dynamic address translation is necessary when the number of IPs to translate does not equal the number of IPs to translate to, or they are equal but for some reason it is not desirable to have a static mapping. The number of hosts communicating is generally limited by the number of NAT IPs available. When all NAT IPs are being used then no other connections can be translated and must therefore be rejected by the NAT router, for example by sending back 'host unreachable'. Dynamic NAT is more complex than static NAT, since we must keep track of communicating hosts and possibly even of connections which requires looking at TCP information in packets.
As mentioned above, dynamic NAT may also be useful when there are enough NAT IPs, i.e. when m=n. Some people use this as a security measure: it is impossible for someone outside a network to get useful IP numbers to connect to of hosts behind a NAT router doing dynamic address translation by looking at connections that take place, since next time the same host may connect using a completely different IP. In this special case even having more NAT IPs than IPs to be translated (m<n) may make some sense.
Connections from outside are only possible when the host that shall be reached
still has a NAT-IP assigned, i.e. if it still has an entry in the dynamic NAT
table, where the NAT router keeps track of which internal IP is mapped to which
NAT IP. For instance, non-passive FTP sessions, where the server attempts to
establish the data-channel, are no problem (for protocol specific problems see
Section ),
since when the server sends its packets to the FTP-client there
is already an entry for the client in the NAT-table, and it is extremely likely
it still contains the same client-IP to NAT-IP mapping that were there when
the client started the FTP-control channel, unless the FTP session has been
idle for longer than the timeout of the entry.
However, if an outsider wants to establish a connection to a certain host on
the inside at an arbitrary time there are two possibilities: the inside host
does not have an entry in the NAT-table and is therefore unreachable, or it
has an entry, but which NAT-IP must be used is unknown, except, of course, the
IP to connect to is known because the internal host is communicating with the
outside. In the latter case, however, only the NAT-IP is known but not the internal
IP of the host, and this knowledge is valid only while the communication of
the internal host takes place plus the timeout of the entry in the NAT routers
table.
Example:
A very special case of dynamic NAT is m:1-translation, a.k.a. masquerading which became famous under that name because Linux can do it. It is probably the kind of NAT-technique that is used most often these days. Here many IP numbers are hidden behind a single one. In contrast to the original dynamic NAT this does not mean there can be only one connection at a time. In masquerading an almost arbitrary number of connections is multiplexed using TCP port information. The number of simultaneous connections is limited only by the number of TCP-ports available.
A special problem of masquerading is that some services on certain hosts only accept connections coming from privileged ports in order to ensure that it does not come from an ordinary user. The assumption that only the superuser can access those ports is not valid, since on DOS or Windows machines everybody can use them, nethertheless, some programs rely on this and cannot be used over a masqueraded connection. The Linux implementation uses no privileged ports for masquerading to avoid interfering with 'regular' connections to these ports. Masquerading usually uses ports in the upper range, in Linux this range starts at port 61000 and ends at 61000+4096, which is the default and can easily be changed by editing linux/include/net/ip_masq.h. This also shows that the Linux implementation by default only allows 4096 concurrent connections. To allow masqueraded connections on ports outside of such a port range requires keeping and managing even more information about the state of connections. Linux, for example, simply treats all packets with destination IP = local IP and destination port is inside the range used for masquerading , as packets that have to be demasqueraded, i.e. they are answers to packets that have been masqueraded on their way out.
Incoming connections are impossible with masquerading, since even when a host
has an entry in the masquerading table of the NAT device this entry is only
valid for the connection being active. Even ICMP-replies that belong to connections
(host/port unreachable) do not get through to the sender automatically
but must be filtered and relayed by the NAT-routers software.
While it is true that incoming connections are impossible we can take additional
measures to enable them, but they are not part of the masquerading code. We
could, for an example, set up the NAT-device so that it relays all connections
coming in from the outside to the telnet-port to a host on the inside. However,
since we have just one IP that is visible outside for enabling incoming connections
for the same service but for different hosts on the inside we must listen on
different ports on the NAT-device, one for each service and internal IP. Since
most applications listen on well-known ports that cannot be easily (and transparently!)
changed, this is quite inconvenient and often no option, especially not for
public services. The only solution is to have as many (external) IPs as the
number of services that shall be provided. An external IP can still be shared
by different services, and then be remapped to different internal IPs using
NAT, but that is not part of masquerading, then.
Example:
The greatest advantage of masquerading for many people is that they only need one official IP-address but the entire internal network can still directly access the Internet. This is so important because IP addresses have become quite expensive. As long as there are application level gateways we do not need any IPs or any kind of NAT and one IP is still enough, but for some protocols, e.g. all UDP based services, there is just no gateway so direct IP connectivity is necessary.
At the time of this writing there existed an Internet Draft (which I should not reference here, since it is just a draft) from the same people who wrote RFC 1631 (NAT). It explains masquerading, that they call Network Address Port Translation (NAPT), in great depth. There is no IETF-paper (none that I could find, at least) on more recent forms of NAT like the ones introduced in the following chapters, although there are (commercial) implementations of them. It seems like for the IETF NAT only exists for helping to solve the classical Internet address space shortage problems described above.
Some of these new technologies are introduced below. I write 'some', because I am sure more are to follow that nobody even dreams of today. Not that NAT is vitally important, we could probably live without it somehow, but that is true for many more things developed or invented in the past X*1.000 years. It just can make life easier sometimes. It can also make it harder -- again, this is true not only for NAT, since everything can be used for good and bad.
Example:
The critical element here will always be the algorithm used to achieve equal
load distribution. The more accurately we try to measure the load, the more
data we need to handle on the NAT-router and the harder it gets to collect the
data in the first place. This is somehow similar to Heisenberg's Uncertainty-Principle
in Quantum theory, so we must find a way to minimize the tradeoff between what
it costs to determine the load and what possible use we can get out of that
knowledge.
Even when we assume we could find a way to accurately determine the load (based
on an ultimate definition of what load actually is) practice does not
honour our efforts: Since an IP packet has a minimum size (like a quantum in
physics) and, in addition, we can only select to which host we want to send
it when a new connection is opened, we will never be able to achieve an infinitely
equal load distribution. Of course, the above is not of any practical interest,
but it certainly is interesting. It does have a practical impact in so far that
it shows us when it is useless to refine the algorithms any further.
There are numerous other approaches to load balancing, most of them on a higher
(user) level. One example is described in RFC 1794 (DNS Support for Load Balancing).
Here the DNS controls the load of machines by giving away the IP of the least
busy machine when queried. Since DNS-queries will be cached by subsequent DNS-servers
the control is severely limited, but it will work quite well if there are many
queries and when they come from a lot of different clients. However, even if
load balancing may work under certain circumstances this approach will not help
when one of the servers fails and is no longer available, since even if this
particular IP is no longer given in queries, it still is in many caches.
Another example is the famous cache program squid , which uses complicated
algorithms to find out where to get an object from[7]. This solution is no general
solution but limited to this particular program. With NAT on the other hand
we can do load distribution for a much larger variety of services, as long as
they are based on IP. Squid serves a different purpose so a comparison does
not work, I used it as an example where the intelligence to do load balancing
and to collect the data is implemented in all the programs involved and not
in an independent central authority.
p1...pn: probability of a failure of server n in N is number of servers that provide the virtual service
pNAT: probability of a failure of the NAT-router, which fails independently of the other devices
pvirt: probability of a failure of the virtual server, when the individual servers fail independently of one another
Of course, the setup used above for load balancing must be enhanced in order to make changes to the list of servers used by the NAT-router to remap connections to as soon as one of the real servers is no longer available. This, however, does not belong in the NAT-code but can better be controlled in higher layers, even from shell-scripts. There must than be a mechanism to remove servers from the virtual server table. Since there must be an interface to build the virtual server table in the first place anyway it is not hard to add features, so that IPs can be added and removed from a virtual server during run time. With this setup we have combined the two functions load balancing and high availability, using virtual servers, and it is even transparent to all hosts, users and programs using the virtual service.
How can we do this with NAT? Imagine, we had two Internet providers. Two, because
we do not want to rely on the network of just one of them in case of a failure
of their networks. Every host that needs Internet connectivity needs a unique
IP, so we buy one IP for each of them from each provider. When our hosts want
to use provider one they use this provider's IP as local IP; when they want
to use provider two they use the IP given by this one as local IP. Every host
with an IP of both providers can now use either one to send its packets to the
same destination.
Now we already see where we are going. The setup described has the potential
to solve the problem, we could do load distribution by letting some hosts use
provider one and others provider two, and we have a higher availability of the
connection to the Internet, since it is more unlikely that both providers have
a major breakdown than it is for one of them (how we calculate the probability
has been illustrated above). However, as it is easy to imagine we would have
a very hard time trying to do load balancing when each host decides on its own
where it sends its packets, not to mention how hard it would be to convince
a network application to use one or the other local IP. This calls for a central
authority to do the decision which host should use which provider, and this
authority will, of course, be a special NAT-router.
Using NAT, our local computers need just one IP, since it is no longer up to them to decide which provider (and therefore which IP) to use. If we had a favorite provider, we could use this providers IPs for our hosts, but we can also use internal IPs. Now, when an internal hosts wants to establish a new connection with a destination on the Internet, it just sends its packets to its default router, which is the NAT-router (in the end, there might be other routers involved), and the source IP is the hosts local (internal) IP. The NAT-router, because it knows all connections, decides which provider will route this connection, replaces the source hosts (internal) address with one of the provider chosen and sends it out to this providers router. Since the source address is an address of this providers network, the answers will also come in that way. The host where the packets originated never gets to know which provider had been chosen by the NAT-router, so this process is transparent.
We can use the same algorithms as for virtual servers, so we can do load balancing and we have the high availability feature. The essential difference to the virtual server implementation is that we have to interfere with the routing process. In the above example we actually have two default routes, for example.
I must not forget to say that if NAT is being used all packets must go through the NAT-router, i.e. there must not be any alternative routes a packet could take, circumventing the address translation. The reason is obvious, but due to their nature as tools for organizing private networks NAT routers are mostly placed on the borderlines of internal (leaf) networks this should be no problem.
On the other hand, if we do not keep state information but only look at the IPs that need a NAT-IP assigned, it is much simpler to implement NAT and it will in many cases work as well as the more complicated solution above. Under light load, i.e. when there are always enough unused NAT-IPs left, we will not notice much difference between both variants, except for in the telnet-session (and related programs, e.g. ssh) case. Only when there are not many NAT-IPs left keeping state information is recognizably of advantage, since we are able to exactly identify connections that have just been closed and can reassign their associated NAT-IPs immediately without waiting for a timeout. That keeping track of the state of individual connections adds to overall security if it is used by firewall code is another issue that has nothing to do with NAT.
There is another case where NAT should not just look at the IPs but at the connections individually, that is when we are using virtual servers or virtual network routes for load distribution and there are a significant number of connections coming from only a few IPs, maybe because the IPs belong to big servers that open many connections. In this case when only the IP is examined load distribution will not fully work, because the traffic generated by an individual IP can than not be divided any further. When we look at the (TCP,UDP) ports in addition to the IPs, we can distribute the load more equally by remapping individual connections rather than individual IPs, as the following picture illustrates.
When we have to do that much just to be able to use the TCP or UDP addresses for our to-NAT-or-not decision the idea to use NAT to also rewrite these addresses is almost obvious. This way we have not just IP address translation but also UDP/TCP address translation. It may be less significant but it certainly is a useful extension. An example for its use are virtual servers: Let us assume we want to create a virtual webserver. Let us further assume that the real webserver daemons running on different machines listen on different ports for some reason. When we do not rewrite the destination port in packets to the virtual server (and insert the original port on answer packets) this setup is impossible, then all webservers must listen on the same port where the virtual server provides its web service.