How would YOU achieve 100% uptime?

Blanoz · July 2014

Greetings,

I bring to your attention a subject which I find to lack popularity. I'd appreciate if you could share the setup that you have personally tested or even better, already implemented.

You have 2 servers, dedicated or VPS's spread around the globe. Or at least not in the same datacenter. What is your solution of having literally no downtime during hard times?

It is frustrating to experience downtime for personal servers and up to extreme annoyance for production servers. Your hints or actual setups will be greatly appreciated.

PS: if this has been discussed already, please point me to the thread; I have failed to find one as such.

AndreiGhesi · July 2014

100% is a myth or to expensive. Everybody has downtime at some point.

linuxthefish · July 2014

Redundant DNS servers, round robin DNS, and high availability proxies! It all depends on what 100% uptime means to you, for example if my servers and backups are available to me when I need them - that's fine for me!

Blanoz · July 2014

round robin DNS

Round robin DNS wouldn't cause ~50% of requests to fail?

Redundant DNS servers

So a 3rd "master" server is necessary to point to one of the available servers.

TheRedFox · July 2014

./god/configure --disable-downtime --prefix=/proc/heaven && cd god && make && make install

linuxthefish · July 2014

Blanoz said: Round robin DNS wouldn't cause ~50% of requests to fail?

Not normally, most browsers and such are smart enough to go to the next IP.

Blanoz said: So a 3rd "master" server is necessary to point to one of the available servers.

If the master server is down, the slaves still serve DNS requests.

XxNisseGamerxX · July 2014

Go with ColoCrossing ? lol

Chuck · July 2014

@XxNisseGamerxX said:
Go with ColoCrossing ? lol

ColoCrossing is a myth.

Wintereise · July 2014

If I had an actual business need for 100%, anycast with VRRP/CARP on the local side + some sort of load balancing would probably be the way to go.

pfsync with pf if a firewall is required locally, et al.

rds100 · July 2014

Even with failover setup, etc. there is still some technical time needed for the switchover to happen. So zero downtime is not possible, but achieving small maximum downtime (a couple of minutes or so) should be possible.

Wintereise · July 2014

there is still some technical time needed for the switchover to happen

Depends on what kind of switchover you're talking about.

DNS based ones, absolutely. Routing based ones where multiple routes existed to multiple fully redundant gateways? Nope.

rds100 · July 2014

@Wintereise even if you anycast it and withdraw the routes for one of the locations from BGP, there is still some time needed for the routes to reconverge. BGP updates are not instant.

Wintereise · July 2014

rds100 said: @Wintereise even if you anycast it and withdraw the routes for one of the locations from BGP, there is still some time needed for the routes to reconverge. BGP updates are not instant.

The point is that routes should never have to be withdrawn, so reconvergence should never happen -- period.

This part is easily accomplishable with redundant edge/border routing and some sort of redundancy protocol like HSRP/VRRP/CARP. Use diverse fiber paths, if possible different buildings to house core gear and you're very likely to be on the way to a full 100%.

Now, of course, if a Tornado knocks down a whole city, then you're gonna have to deal with reconvergence -- but the likelihood of that happening is little to nil on most dcs, and in those cases, I think a few seconds to reconverge is perfectly acceptable.

AnthonySmith · July 2014

In previous work we had 2 DC's (small ish) side by side with fiber between them for the SAN <> SAN replication (these were $250,000 EMC SANS) with, cross site clusters that could take something silly like an 8 in 10 failure before service was impacted etc.

This was a money is no object project, probably cost about $15,000,000 for file storage/ app hosting and email that simply could not go down.

It was based on an island, the amount of cables ran with different carriers was insane.

Sadly about 2 weeks after it was signed off as complete the whole island lost power, due to an offshore disaster, after 4 days of people literally running backwards and forwards night and day with diesel for the generators a fire broke out and the whole thing completely failed.

So nothing is 100% because something will always happen, it is fair to say it is achievable but only with good luck

That is why I like lowendspirit, €13.00 p/year for 5 x servers in different countries/ DC's with HAproxy for fail-over so even if 4/5 DC's blew up at the same time you would still stay up

In the above example had they taken the initial advice and replicated it with the mainland this would not have happened but I think it was an extra $7,000,000 for the cross connect run under the water.

rds100 · July 2014

Even if you go to extremes for redundancies at your end, your upstreams will fail you sooner or later. They all do services affecting maintenances some times. And they all screw up without any warning sometimes. I've seen this from all kind of carriers - both premium like Level3 or shitty like Cogent. So reconvergence will be necessary sometimes. It is unavoidable. And since IP is a "best effort" protocol - it's OK too, you just have to accept it. You can minimize it, but can never 100% guarantee that it won't happen.

Wintereise · July 2014

rds100 said: And since IP is a "best effort" protocol - it's OK too, you just have to accept it. You can minimize it, but can never 100% guarantee that it won't happen.

Correct, all I'm trying to say is that it's possible to minimize that window down to near zero levels.

A downtime that nobody noticed isn't a downtime, etc

geekalot · July 2014

@Blanoz:

http://lowendtalk.com/discussion/comment/647333/#Comment_647333

http://lowendtalk.com/discussion/comment/647405/#Comment_647405

mp99e99 · July 2014

I'd definately use proxies and go multisite, will always beat a single location no matter how good the hardware/UPS/generators/etc.

Howdy, Stranger!

Categories

In this Discussion

How would YOU achieve 100% uptime?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

How would YOU achieve 100% uptime?

Comments