Overview of Anycast - building a robust DNS

A Whitepaper

Introduction to DNS

All Internet hosts, including your computer when it is connected to the Internet, use a DNS server. Every time you go to a website, you need to look up the site's IP address using the domain name of the website. Your request for this lookup is eventually passed to a DNS server somewhere.

But your request is one of thousands, even millions of requests being made at any one time across the Internet or if you are  in an internal corporate network (Intranet). The DNS lookup process requires that if your local DNS server is not Authoritative for the domain that contains the domain name you are trying to reach, it should ask other servers to get an answer (this is known as DNS recursion). Your local server could get quite busy performing these lookup requests, and this could slow down its performance if it is Authoritative for a domain name.

To combat this the answers that a DNS server gets from another DNS server can be added to their own internal database and retained for a period of time equal to the time to live (ttl) value set on the record stored on the Authoritative DNS server.

Storing these responses is called caching, and allows a DNS server to respond more quickly to multiple queries for the same domain or host. If you are on a website, and want to retrieve the next page on the site, the local DNS server does not have to look up the host again, provided the time to live (ttl) value has not expired and caused the local DNS server to delete the information. This is why it takes so long to contact a website at first, but subsequent requests for pages on the same site are somewhat faster.

Caching DNS servers are configured for recursive lookup as well. This creates a server that will respond to lookup requests by delivering answers from its cache, or looking them up on other servers. It is the job of a caching DNS server to handle general lookups of Internet domains. A caching DNS server reduces the load placed on an Authoritative DNS server by handling the requests that don not pertain to the local domain.

Almost all Internet Service Providers (ISPs) operate some kind of caching DNS server. A majority of them use BIND from ISC.

Unfortunately DNS caching is a double-edged sword. It speeds up resolution by storing recent answers, and short-circuiting the normal resolution process. However there is a down side. Because DNS servers cache answers, and don't delete these answers until the time to live (ttl) expires, it can take hours or days for the entire Internet to recognize changes to DNS information for your domain name. This is where TCPWave appliances come into play and bridge the gap. TCPWave appliances leverage the Anycast technology.

Anycast allows multiple, identical, globally deployed DNS servers to advertize the same IP address. For all intents and purposes, the same server exists in dozens or hundreds of places simultaneously. When an Internet user looks up your domain name, they find the Anycast instance topologically closest to themselves. Usually, there's a correlation between network topology and physical geography. A campus in Frankfurt might find and use a DNS appliance located in London or Paris, while campus in Brazil might find themselves getting the DNS service from New York. DNS appliances are typically placed in major data centers or in campuses with large user population.

Anycast is normally highly reliable, as it can provide automatic failover. The TCPWave Anycast apppliance typically feature internal "heartbeat" monitoring of the appliance's function and the health of the nexthop. They make an intelligent decision and withdraw the route announcement if the appliance fails or if the nexthop is not stable. The TCPWave appliances achieve this by attaching  the anycast prefix to the router over OSPF or another IGP protocol. If the appliances die, the router will automatically withdraw the announcement. "Heartbeat" functionality is important because, if the announcement continues for a failed appliance, the server will act as a "black hole" for nearby clients; this failure mode is the most serious mode of failure for an anycast system. Even in this event, this kind of failure will only cause a total failure for clients that are closer to this server than any other, and will not cause a global failure. The TCPWave appliances do not operate in an active/passive mode. In a data center, if you have two TCPWave appliances that are advertising the VIP into the network, both of them will see DNS queries being serviced. The router will do the load balancing to the TCPWave appliances. We will cover this topic  a little later in this whitepaper.

At the root/authoritative layer, it is recommended to have  a handful number of master and slave servers (4 to 5) spread geographically for resiliency. Since TCPWave is cache and routing oriented firm, any IPAM that can manage the authoritative servers running standard BIND will be compatible with the TCPWave software. The reason why we recommend 4 or 5 masters is because upgrades become easier and there will be lesser number of slaves that need to get their zones refreshed via a zone transfer.
 

As an example, f.root-servers.net is advertised via anycast by multiple nodes from the following locations:

Europe: Lisbon, Madrid, Barcelona, Paris, Amsterdam, Munich, Rome, Prague Moscow, London, Torino
Middle East: Tel Aviv, Dubai, South Africa, Kenya
Asia: Beijing, Seoul, Osaka, Hong Kong, Taipei, Singapore, Jakarta, Chennai, Brisbane, Auckland, Dhaka, Karachi
Americas: Monterrey, São Paulo, Santiago de Chile, Los Angeles, San Jose, New York, Toronto, Ottawa

Caches which are rooted to the internet roots may pick f.root-servers.net via the BIND RTT algorithm. They would get a referral from their closest root, thereby reducing the time involved in DNS recursion. The response that is learnt by the cache is stored locally until the TTL expires. Mission critical application servers, which point to their closest cache appliance, will get a faster response compared to a server  pointing to a unicast appliance that is located across multiple network hops.

Anycast makes DNS more reliable

When you deploy identical TCPWave appliances at multiple nodes, on multiple networks, in widely diverse geographical locations, all using Anycast, you're effectively adding global load-balancing functionality to your DNS service. Importantly, the load-balancing logic is completely invisible to the DNS servers; it's moved down the stack from the application to the network layer. Because each node advertises the same IP address, user traffic is shared between servers globally, handled transparently by the network itself using standard BGP routing. Refer to the Cisco knowledge base link to learn more about routing when multiple sources advertise the same virtual IP address.

 

Anycast improves DNS performance

On the Web, having a fast site is not only a user experience issue. Now, the speed with which pages load is even alleged to be used as a factor in search engine rankings. Naturally, anything an organization can do to improve performance is desirable, and Anycast can help towards that objective.

By allowing clients to reach the DNS appliance closest to them, the latency associated with multiple hops can be reduced and potential network bottlenecks between the user and the DNS appliance become irrelevant. If there's an Anycast resolver node hosted at the DNS client's nearest large Internet exchange, there's no need for their query to make an intercontinental round-trip before it can get on with the important business of pulling down Web pages.

Anycast provides resilience against DDoS attacks

While distributed denial of service attacks are, as the name suggests, distributed, the botnets used to launch the attacks tend not to be distributed evenly. Malware, with which “bots” infect end user PCs, is often designed for specific regions or users of specific languages, and botnets are often clustered to reflect that. The 2007 root server attack, for example, saw most of its traffic originating in Asia-Pacific.

More recently, the “Mariposa” botnet, which was shut down in late 2009, has been cited as having more than 12 million IP addresses associated with it, but over half of those addresses belonged to networks in just five countries: India, Mexico, Brazil, Korea and Colombia. If an organization were to come under attack by Mariposa and it had broadly deployed Anycast-enabled DNS nodes, it could have seen some locations absorb the brunt of the attack. A node deployed to a network in Brazil, for example, may have been able to accommodate almost a fifth of the unwanted traffic, preserving the rest of the infrastructure to still successfully answer queries from real users in the rest of the world.

Of course, designing and rolling out an Anycast DNS network is not a trivial task. The complexities associated with managing servers in multiple locations are obvious. As with any system design decision, the trade-off between availability and cost has to be discussed. But, as the managers of the Internet's most crucial addressing resources have found, Anycast should be an important part of the cost-benefit discussions you have when you proclaim your web site open for business on the DNS. TCPWave continues to demonstrate its strength in moulding this industry standard and customizing it for a Fortune 1000 organization with a large internal corporate network.

For a detailed and in depth look, you can visit the WiKi at http://en.wikipedia.org/wiki/Anycast