Round Robin DNS != High Availability. Or am I wrong?

raindog308 · April 2012

I constantly see people who discuss their "high availability" setups like this:

round robin DNS
round robin DNS with a low TTL and some kind of automated DNS change

The first one is pure confusion. Clients do not "try one and if it fails, lookup again". They lookup, get an address, and assume that's the address. Round robin DNS is fine for load balancing, but not for HA. If you have two IPs in a RRDNS and one goes down, 50% of your clients will continue to hit the down server (unless you have some custom client code, but in this case I'm thinking of web browsers).

Even with some low TTL/automated DNS change, it's still weak. There is no guarantee that any nameserver is going to honor your 60-second TTL - I've read some of the big ones ignore anything less than an hour. Second, you're assuming my browser or client will not cache things or that it's cache is short. One example: Internet Explorer caches for 30 minutes by default. FasterFox caches for 1 hour. Etc. And finally, my browser has no idea it's got a "round robin" address - it has no idea that it should check again if the first one doesn't work.

Granted, I think the DNS standard could implement some kind of extension that tags lookups as "there are other addresses you could use". But it doesn't.

So. Am I wrong?

NickW · April 2012

Yes, you're wrong (partially).

Round robin DNS is not HA, but it can be useful. Think of it more of a poor man's load distribution. It is also the simplest and most fundamental way of doing so. Look up the records for any major website, almost all of them have round robin as it's free and not entirely useless.

# dig www.google.com

; <<>> DiG 9.7.3-P3-RedHat-9.7.3-8.P3.el6_2.2 <<>> www.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11085
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 4, ADDITIONAL: 0

;; QUESTION SECTION:
;www.google.com.                        IN      A

;; ANSWER SECTION:
www.google.com.         328010  IN      CNAME   www.l.google.com.
www.l.google.com.       83      IN      A       173.194.34.17
www.l.google.com.       83      IN      A       173.194.34.18
www.l.google.com.       83      IN      A       173.194.34.19
www.l.google.com.       83      IN      A       173.194.34.20
www.l.google.com.       83      IN      A       173.194.34.16

;; AUTHORITY SECTION:
google.com.             68855   IN      NS      ns2.google.com.
google.com.             68855   IN      NS      ns3.google.com.
google.com.             68855   IN      NS      ns4.google.com.
google.com.             68855   IN      NS      ns1.google.com.

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat Apr  7 00:33:47 2012
;; MSG SIZE  rcvd: 204

All modern browsers do see the multiple IPs in the DNS request and will try the next on the list if the first fails. FireFox (and probably others) caches which one eventually worked and uses it for future requests to that domain. Only the very first request do the down server will be slow while it figures it out, but the rest will be as normal. This of course assumes that the down server is completely unresponsive. If it gives an error code then the browser will think it's fine.

lbft · April 2012

It's not HA but it helps (and it's the best you're likely to get with LEBs). It's not going to get you 100% site uptime but either of those reduces the impact of a failure a little bit:

Round robin DNS means that your requests will be roughly distributed across a number of IP addresses - so if, say, one of three servers goes down, two thirds of your visitors are still hitting the working servers (assuming the DNS cache lasts their entire browsing session; if it doesn't, then only 2/3 of their requests will succeed, although keepalive could improve that number).
Low-TTL round-robin DNS with automated changing - it reduces the outage window to the length of the caching of the response. Akamai does this plus geoip for their CDN, I think, so it must show some improvement.

raindog308 · April 2012

Thanks. I guess I wasn't aware that browsers get all address...though now that I pause to read the getaddrinfo page, it's plain ("getaddrinfo() returns one or more addrinfo structures"). Probably the same on Windows/Mac-based libraries.

MrAndroid · April 2012

@NickW said: All modern browsers do see the multiple IPs in the DNS request and will try the next on the list if the first fails. FireFox (and probably others) caches which one eventually worked and uses it for future requests to that domain. Only the very first request do the down server will be slow while it figures it out, but the rest will be as normal. This of course assumes that the down server is completely unresponsive. If it gives an error code then the browser will think it's fine.

I thought its the OS that runs the DNS lookup, and the browser just simply gets the for it?
>

@raindog308 said: Thanks. I guess I wasn't aware that browsers get all address...though now that I pause to read the getaddrinfo page, it's plain ("getaddrinfo() returns one or more addrinfo structures"). Probably the same on Windows/Mac-based libraries.

getaddrinfo() is POSIX, but a similar version also exist on Windows.

eva2000 · April 2012

WHT had this debate on Round Robin DNS further along in this thread at http://www.webhostingtalk.com/showthread.php?t=1117385 nice read as it gets more technical further into the thread by folks have had hands on experience with it including how the browser deals with it.

NickW · April 2012

"WHT had this debate on Round Robin DNS further along in this thread at http://www.webhostingtalk.com/showthread.php?t=1117385 nice read as it gets more technical further into the thread by folks have had hands on experience with it including how the browser deals with it."

"mugo" in that thread is wrong and pushing out of date crap on the subject. Maybe once upon a time 10+ years ago it was correct.

"I thought its the OS that runs the DNS lookup, and the browser just simply gets the for it?"

It does not matter either way as the OS's lookup is perfectly capable of returning multiple A records. For example, on Windows:

C:\Users\Nick>nslookup www.youtube.com
Server:  home
Address:  192.168.1.1

Non-authoritative answer:
Name:    youtube-ui.l.google.com
Addresses:  173.194.41.99
          173.194.41.104
          173.194.41.98
          173.194.41.102
          173.194.41.96
          173.194.41.105
          173.194.41.103
          173.194.41.101
          173.194.41.110
          173.194.41.97
          173.194.41.100
Aliases:  www.youtube.com

klikli · April 2012

Please always bear in mind that a number of recurrsive DNS servers do not respect TTL so DNS should never be used for HA.

MrAndroid · April 2012

Quick question, how do you do the green boxes?

NickW · April 2012

I do a greater than symbol > then a space and then enclose everything in double quotes "". For a quotation.

For a green box, put four extra spaces in front of every line.

NickW · April 2012

If there's still any disbelievers, here's an ancient piece of Mozilla documentation http://www-archive.mozilla.org/docs/netlib/dns.html Read under the round robin support heading.

If they managed to implement it god knows how many years ago I would assume all modern browsers have.

othelloRob · April 2012

@raindog308 said:
Round robin DNS is fine for load balancing, but not for HA

It's not really any use for HA, as routers, browsers, isps etc cache the DNS result, and ignore your TTLs

lbft · April 2012

Also, for those using Cloudflare, it will only try one IP per page load rather than using browser-like retry behaviour. The backend IP to use appears to be selected at random and, if it's down, it'll either serve a cached version or throw an error message.

CloudxtnyHost · April 2012

Rather than doing round-robin you'd want a DNS server that had testing implemented within its core. For example 4psa DNS allows you to setup tests for records (for example ping this ip) and if the test fails the IP is removed from DNS.

With powerdns you can do some very fancy custom coding to develop tests etc.

The main problem with DNS is always going to be DNS Cache as @othellorob has pointed out.

Howdy, Stranger!

Categories

In this Discussion

Round Robin DNS != High Availability. Or am I wrong?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Round Robin DNS != High Availability. Or am I wrong?

Comments