Round Robin DNS != High Availability. Or am I wrong?

I constantly see people who discuss their "high availability" setups like this:
- round robin DNS
- round robin DNS with a low TTL and some kind of automated DNS change
The first one is pure confusion. Clients do not "try one and if it fails, lookup again". They lookup, get an address, and assume that's the address. Round robin DNS is fine for load balancing, but not for HA. If you have two IPs in a RRDNS and one goes down, 50% of your clients will continue to hit the down server (unless you have some custom client code, but in this case I'm thinking of web browsers).
Even with some low TTL/automated DNS change, it's still weak. There is no guarantee that any nameserver is going to honor your 60-second TTL - I've read some of the big ones ignore anything less than an hour. Second, you're assuming my browser or client will not cache things or that it's cache is short. One example: Internet Explorer caches for 30 minutes by default. FasterFox caches for 1 hour. Etc. And finally, my browser has no idea it's got a "round robin" address - it has no idea that it should check again if the first one doesn't work.
Granted, I think the DNS standard could implement some kind of extension that tags lookups as "there are other addresses you could use". But it doesn't.
So. Am I wrong?
Comments
Yes, you're wrong (partially).
Round robin DNS is not HA, but it can be useful. Think of it more of a poor man's load distribution. It is also the simplest and most fundamental way of doing so. Look up the records for any major website, almost all of them have round robin as it's free and not entirely useless.
All modern browsers do see the multiple IPs in the DNS request and will try the next on the list if the first fails. FireFox (and probably others) caches which one eventually worked and uses it for future requests to that domain. Only the very first request do the down server will be slow while it figures it out, but the rest will be as normal. This of course assumes that the down server is completely unresponsive. If it gives an error code then the browser will think it's fine.
It's not HA but it helps (and it's the best you're likely to get with LEBs). It's not going to get you 100% site uptime but either of those reduces the impact of a failure a little bit:
Thanks. I guess I wasn't aware that browsers get all address...though now that I pause to read the getaddrinfo page, it's plain ("getaddrinfo() returns one or more addrinfo structures"). Probably the same on Windows/Mac-based libraries.
I thought its the OS that runs the DNS lookup, and the browser just simply gets the for it?
>
getaddrinfo() is POSIX, but a similar version also exist on Windows.
WHT had this debate on Round Robin DNS further along in this thread at http://www.webhostingtalk.com/showthread.php?t=1117385 nice read as it gets more technical further into the thread by folks have had hands on experience with it including how the browser deals with it.
"mugo" in that thread is wrong and pushing out of date crap on the subject. Maybe once upon a time 10+ years ago it was correct.
It does not matter either way as the OS's lookup is perfectly capable of returning multiple A records. For example, on Windows:
Please always bear in mind that a number of recurrsive DNS servers do not respect TTL so DNS should never be used for HA.
Quick question, how do you do the green boxes?
I do a greater than symbol > then a space and then enclose everything in double quotes "". For a quotation.
For a green box, put four extra spaces in front of every line.
If there's still any disbelievers, here's an ancient piece of Mozilla documentation http://www-archive.mozilla.org/docs/netlib/dns.html Read under the round robin support heading.
If they managed to implement it god knows how many years ago I would assume all modern browsers have.
It's not really any use for HA, as routers, browsers, isps etc cache the DNS result, and ignore your TTLs
Also, for those using Cloudflare, it will only try one IP per page load rather than using browser-like retry behaviour. The backend IP to use appears to be selected at random and, if it's down, it'll either serve a cached version or throw an error message.
Rather than doing round-robin you'd want a DNS server that had testing implemented within its core. For example 4psa DNS allows you to setup tests for records (for example ping this ip) and if the test fails the IP is removed from DNS.
With powerdns you can do some very fancy custom coding to develop tests etc.
The main problem with DNS is always going to be DNS Cache as @othellorob has pointed out.