Idea for improving a RR + CloudFlare setup

MrOwen · December 2012

Let's assume you have round robin setup on domain A which points to two nodes which handle load balancing, etc. That's cool until one of your load balancers goes down and now you have clients who have cached the IP of your downed node. I assume that within a few hours (maybe < 1 hour) you'll be able to either bring up that node again or take it out of the round robin. Why not automate this? Here's my idea:

Have Uptime running on Modulus or AppFog. Also, Uptime now has the ability to use plugins.
Add both load-balancers as checks in Uptime.
Create a plugin which deletes the A record in CloudFlare (using their API) if one of the nodes goes down.
If/when the node comes back up, have Uptime make a call to CloudFlare to add back that A record.

Although this isn't a true "heartbeat" type fail-over deal, you can have the checks occur every 10s which means, in theory, your bad IP would be pulled within that time which is pretty darn fast.

Issues you see with this setup? Aside from the obvious (the Uptime app crashes [I've been running it for quite some time and it seems pretty stable] or the Modulus or AppFog service providers have downtime).

ihatetonyy · December 2012

@MrOwen said: Issues you see with this setup?

Can't you do this all with Rage4 and UptimeRobot etc?

MrOwen · December 2012

@ihatetonyy said: Can't you do this all with Rage4 and UptimeRobot etc?

You can but checks are done every 5 minutes instead of every 10s (you have even go lower but 10s is pretty fast).

pubcrawler · December 2012

Feel free to slap this around, it's what I do.

Domain DNS TTL, don't set it for hours, set it for seconds. Typically 60 second TTL is least you can go without servers and helpful software trying to ignore it and set it higher.
Feed user both IPs on the lookup. A lookup should return both location IPs. Most big sites do this, for example:
Name: ebay.com
Address: 66.135.205.13
Name: ebay.com
Address: 66.135.205.14
Name: ebay.com
Address: 66.211.160.87
Name: ebay.com
Address: 66.211.160.88

Non-authoritative answer:
Name: amazon.com
Address: 72.21.211.176
Name: amazon.com
Address: 72.21.214.128
Name: amazon.com
Address: 72.21.194.1

In case of failure / timeout, the users browser should try the other IP since it has the info from the original lookup and likely cached. In case of new visitor, they too get the cached info. But you can manually purge the IP info of the downed server from your DNS server as needed, thus providing newly connected users with just one IP.

If you are concerned about your load balancer reliability then accomplish that with Nginx proxy and install that on three different low end VPS nodes .

Now if your users have sessions that need persistence then you are going to need to edit your Nginx config to hash the user IP and determine which server he/she belongs on, try to pass them there and in case of failure push them to the other server which is up (thus, losing their session info).

As far as hiding all that behind Cloudflare, that's your luxury to figure out

gbshouse · December 2012

@MrOwen - You can do this with us and there is no need to delete A record, just activate failover using one of the monitoring integrations (UptimeRobot, NewRelic, ScaleXtreme), generic webhook or API call. I've looked on your code and it should be easy to add webhook support

tsanten · December 2012

@MrOwen Maybe this will help you http://blog.booru.org/?p=12

imperio · December 2012

Dnsmadeeasy have support for tcp/http/https/dns/udp monitoring and automatic failover for up to 4 alternative IPs.So you can set 5 IPs for round robin and when main IP goes down(1 min interval and number of monitoring verification locations can be set) next failover IP will be checked if it is up then failover and vice versa.I am using them for a enterprise project and it works like a charm.I do not think it costs much if you really need this kind of uptime.

pubcrawler · December 2012

The DNSMadeEasy package is $50 a year if memory serves me right.

Good service and price point if small enough to fit into that (amount of lookups a month) and if you do not need geographic DNS functionality.

eva2000 · December 2012

using dnsmadeeasy dns failover myself for 3x haproxy load balancers for 5x LEB VPS (4x openvz and 1x kvm) mainly to get over openvz limitation of no local binding/sysctl editing ability for keepalived

MrOwen · December 2012

Thanks for the great suggestions everyone (I was a little preoccupied yesterday, Christmas and all...)!

@pubcrawler, thanks for that general information about the rr setup. I wasn't 100% if clients tried the additional IPs if the first one didn't respond. Although, does this check imply any additional load time if the IP they assigned is down? I would assume it just does a ping test which might take a second or two.

@tsanten, that blog post is basically exactly what I'm looking for! I would like to stick with CloudFlare because I think they provide a great DNS service.

Also, thanks for suggestions about dnsmadeeasy but honestly, I'm not sure I want to shell out $30/year for DNS services for my personal use when I could just use that money for additional machines

pubcrawler · December 2012

The RR thing MrOwen (feeding user 2 different IPs) is a function of DNS. However, it's your browser that determines what do in case of outage or failure. I believe all modern browsers renegotiate with a second attempt to the other IP. Feel free to test and confirm.

tsanten · December 2012

@MrOwen I was thinking will be helpfull without reading all
Cloudflare its good but im afraid will not stay free for long (once the beta services finish)

MrOwen · December 2012

@tsanten said: @MrOwen I was thinking will be helpfull without reading all
Cloudflare its good but im afraid will not stay free for long (once the beta services finish)

I would imagine it will remain free because I think they make enough from their pro, business, and enterprise plans which have more features than the basic plan like ssl and their railgun dealio.

Wintereise · December 2012

The normal plans will most likely stay free, pro/ent works awesome though.

I use pro on two sites, it's really cheap once you get over the fact that the first site is $20, lol.

sandro · December 2012

I love this stuff How do you handle sessions between different servers if you started a session from a server that's currently down and you can't check credentials??

eva2000 · December 2012

Love discussions like this. If you want to read more on this there's a nice heated debate over at WHT http://www.webhostingtalk.com/showthread.php?t=1117385 as well - including the validity of how different browsers' cache handle failover etc. A few differences in opinions in that thread which make interesting reading

eva2000 · December 2012

@sandro said: I love this stuff How do you handle sessions between different servers if you started a session from a server that's currently down and you can't check credentials??

for me haproxy redispatch option http://code.google.com/p/haproxy-docs/wiki/redispatch or configuring memcached distributed server i.e. php sessions

In HTTP mode, if a server designated by a cookie is down, clients may definitely stick to it because they cannot flush the cookie, so they will not be able to access the service anymore.

Specifying option redispatch will allow the proxy to break their persistence and redistribute them to a working server.

It also allows to retry last connection to another server in case of multiple connection failures. Of course, it requires having retries set to a nonzero value.

sandro · December 2012

that's awesome

eva2000 · December 2012

Indeed no probs with haproxy handling a 600,000 member vbulletin forum

MrOwen · December 2012

@pubcrawler said: The RR thing MrOwen (feeding user 2 different IPs) is a function of DNS. However, it's your browser that determines what do in case of outage or failure. I believe all modern browsers renegotiate with a second attempt to the other IP. Feel free to test and confirm.

It looks like one of the folks at WHT has a differing opinion on this:

A records, or round-robin DNS, do NOT "failover", and if a client picks a down IP, it just times out (yes it gives all A's setup, but that's it). Round robin / multiple A records just rotates the IP that is given out, but there is no system in the TCP/IP stack of MS Windows or Linux that tells anything to move along to the next if one is down.

>

Modern browsers do not "try each in turn". That's false information, unless it's something new with IE9 I haven't tested yet. That would be the only possible exception, which I'm 99% sure doesn't exist, as it would have almost certainly made my radar.

At this point though, it's mostly anecdotal evidence. I have yet to try it in real life.

pubcrawler · December 2012

I say test it and try. We've long used this, in this way @MrOwen. Is it perfect? Nope, still have timeout issue on the DNS lookup perhaps. Where browser or whatever software interface might get stuck on the first IP until expiration (my time set at 60 seconds). Meaning down for 60 seconds max for that user.

Regardless of what you do, will have some outage for some folks perhaps.

Never saw the A-B scenario where 50% of the viewers end up stuck on crashed IP... Seems possible perhaps, but less so today.

bdtech · January 2013

@MrOwen Chrome and others will try the second IP. Feel free to try it

Howdy, Stranger!

Categories

In this Discussion

Idea for improving a RR + CloudFlare setup

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Idea for improving a RR + CloudFlare setup

Comments