Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Idea for improving a RR + CloudFlare setup
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Idea for improving a RR + CloudFlare setup

MrOwenMrOwen Member
edited December 2012 in General

Let's assume you have round robin setup on domain A which points to two nodes which handle load balancing, etc. That's cool until one of your load balancers goes down and now you have clients who have cached the IP of your downed node. I assume that within a few hours (maybe < 1 hour) you'll be able to either bring up that node again or take it out of the round robin. Why not automate this? Here's my idea:

  1. Have Uptime running on Modulus or AppFog. Also, Uptime now has the ability to use plugins.
  2. Add both load-balancers as checks in Uptime.
  3. Create a plugin which deletes the A record in CloudFlare (using their API) if one of the nodes goes down.
  4. If/when the node comes back up, have Uptime make a call to CloudFlare to add back that A record.

Although this isn't a true "heartbeat" type fail-over deal, you can have the checks occur every 10s which means, in theory, your bad IP would be pulled within that time which is pretty darn fast.

Issues you see with this setup? Aside from the obvious (the Uptime app crashes [I've been running it for quite some time and it seems pretty stable] or the Modulus or AppFog service providers have downtime).

Comments

  • @MrOwen said: Issues you see with this setup?

    Can't you do this all with Rage4 and UptimeRobot etc?

  • @ihatetonyy said: Can't you do this all with Rage4 and UptimeRobot etc?

    You can but checks are done every 5 minutes instead of every 10s (you have even go lower but 10s is pretty fast).

  • Feel free to slap this around, it's what I do.

    1. Domain DNS TTL, don't set it for hours, set it for seconds. Typically 60 second TTL is least you can go without servers and helpful software trying to ignore it and set it higher.

    2. Feed user both IPs on the lookup. A lookup should return both location IPs. Most big sites do this, for example:
      Name: ebay.com
      Address: 66.135.205.13
      Name: ebay.com
      Address: 66.135.205.14
      Name: ebay.com
      Address: 66.211.160.87
      Name: ebay.com
      Address: 66.211.160.88

    Non-authoritative answer:
    Name: amazon.com
    Address: 72.21.211.176
    Name: amazon.com
    Address: 72.21.214.128
    Name: amazon.com
    Address: 72.21.194.1

    1. In case of failure / timeout, the users browser should try the other IP since it has the info from the original lookup and likely cached. In case of new visitor, they too get the cached info. But you can manually purge the IP info of the downed server from your DNS server as needed, thus providing newly connected users with just one IP.

    If you are concerned about your load balancer reliability then accomplish that with Nginx proxy and install that on three different low end VPS nodes .

    Now if your users have sessions that need persistence then you are going to need to edit your Nginx config to hash the user IP and determine which server he/she belongs on, try to pass them there and in case of failure push them to the other server which is up (thus, losing their session info).

    As far as hiding all that behind Cloudflare, that's your luxury to figure out :)

  • gbshousegbshouse Member, Host Rep

    @MrOwen - You can do this with us and there is no need to delete A record, just activate failover using one of the monitoring integrations (UptimeRobot, NewRelic, ScaleXtreme), generic webhook or API call. I've looked on your code and it should be easy to add webhook support

  • @MrOwen Maybe this will help you http://blog.booru.org/?p=12

  • imperioimperio Member
    edited December 2012

    Dnsmadeeasy have support for tcp/http/https/dns/udp monitoring and automatic failover for up to 4 alternative IPs.So you can set 5 IPs for round robin and when main IP goes down(1 min interval and number of monitoring verification locations can be set) next failover IP will be checked if it is up then failover and vice versa.I am using them for a enterprise project and it works like a charm.I do not think it costs much if you really need this kind of uptime.

  • The DNSMadeEasy package is $50 a year if memory serves me right.

    Good service and price point if small enough to fit into that (amount of lookups a month) and if you do not need geographic DNS functionality.

  • using dnsmadeeasy dns failover myself for 3x haproxy load balancers for 5x LEB VPS (4x openvz and 1x kvm) mainly to get over openvz limitation of no local binding/sysctl editing ability for keepalived

  • Thanks for the great suggestions everyone (I was a little preoccupied yesterday, Christmas and all...)!

    @pubcrawler, thanks for that general information about the rr setup. I wasn't 100% if clients tried the additional IPs if the first one didn't respond. Although, does this check imply any additional load time if the IP they assigned is down? I would assume it just does a ping test which might take a second or two.

    @tsanten, that blog post is basically exactly what I'm looking for! I would like to stick with CloudFlare because I think they provide a great DNS service.

    Also, thanks for suggestions about dnsmadeeasy but honestly, I'm not sure I want to shell out $30/year for DNS services for my personal use when I could just use that money for additional machines ;)

  • The RR thing MrOwen (feeding user 2 different IPs) is a function of DNS. However, it's your browser that determines what do in case of outage or failure. I believe all modern browsers renegotiate with a second attempt to the other IP. Feel free to test and confirm.

  • tsantentsanten Member
    edited December 2012

    @MrOwen I was thinking will be helpfull without reading all
    Cloudflare its good but im afraid will not stay free for long (once the beta services finish)

  • @tsanten said: @MrOwen I was thinking will be helpfull without reading all
    Cloudflare its good but im afraid will not stay free for long (once the beta services finish)

    I would imagine it will remain free because I think they make enough from their pro, business, and enterprise plans which have more features than the basic plan like ssl and their railgun dealio.

  • The normal plans will most likely stay free, pro/ent works awesome though.

    I use pro on two sites, it's really cheap once you get over the fact that the first site is $20, lol.

  • I love this stuff :D How do you handle sessions between different servers if you started a session from a server that's currently down and you can't check credentials??

  • Love discussions like this. If you want to read more on this there's a nice heated debate over at WHT http://www.webhostingtalk.com/showthread.php?t=1117385 as well - including the validity of how different browsers' cache handle failover etc. A few differences in opinions in that thread which make interesting reading :)

  • @sandro said: I love this stuff :D How do you handle sessions between different servers if you started a session from a server that's currently down and you can't check credentials??

    for me haproxy redispatch option http://code.google.com/p/haproxy-docs/wiki/redispatch or configuring memcached distributed server i.e. php sessions

    In HTTP mode, if a server designated by a cookie is down, clients may definitely stick to it because they cannot flush the cookie, so they will not be able to access the service anymore.

    Specifying option redispatch will allow the proxy to break their persistence and redistribute them to a working server.

    It also allows to retry last connection to another server in case of multiple connection failures. Of course, it requires having retries set to a nonzero value.

  • that's awesome

  • Indeed no probs with haproxy handling a 600,000 member vbulletin forum :)

  • @pubcrawler said: The RR thing MrOwen (feeding user 2 different IPs) is a function of DNS. However, it's your browser that determines what do in case of outage or failure. I believe all modern browsers renegotiate with a second attempt to the other IP. Feel free to test and confirm.

    It looks like one of the folks at WHT has a differing opinion on this:

    A records, or round-robin DNS, do NOT "failover", and if a client picks a down IP, it just times out (yes it gives all A's setup, but that's it). Round robin / multiple A records just rotates the IP that is given out, but there is no system in the TCP/IP stack of MS Windows or Linux that tells anything to move along to the next if one is down.

    >

    Modern browsers do not "try each in turn". That's false information, unless it's something new with IE9 I haven't tested yet. That would be the only possible exception, which I'm 99% sure doesn't exist, as it would have almost certainly made my radar.

    At this point though, it's mostly anecdotal evidence. I have yet to try it in real life.

  • I say test it and try. We've long used this, in this way @MrOwen. Is it perfect? Nope, still have timeout issue on the DNS lookup perhaps. Where browser or whatever software interface might get stuck on the first IP until expiration (my time set at 60 seconds). Meaning down for 60 seconds max for that user.

    Regardless of what you do, will have some outage for some folks perhaps.

    Never saw the A-B scenario where 50% of the viewers end up stuck on crashed IP... Seems possible perhaps, but less so today.

  • @MrOwen Chrome and others will try the second IP. Feel free to try it

Sign In or Register to comment.