New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
How would YOU achieve 100% uptime?
Greetings,
I bring to your attention a subject which I find to lack popularity. I'd appreciate if you could share the setup that you have personally tested or even better, already implemented.
You have 2 servers, dedicated or VPS's spread around the globe. Or at least not in the same datacenter. What is your solution of having literally no downtime during hard times?
It is frustrating to experience downtime for personal servers and up to extreme annoyance for production servers. Your hints or actual setups will be greatly appreciated.
PS: if this has been discussed already, please point me to the thread; I have failed to find one as such.
Comments
100% is a myth or to expensive. Everybody has downtime at some point.
Redundant DNS servers, round robin DNS, and high availability proxies! It all depends on what 100% uptime means to you, for example if my servers and backups are available to me when I need them - that's fine for me!
Round robin DNS wouldn't cause ~50% of requests to fail?
So a 3rd "master" server is necessary to point to one of the available servers.
./god/configure --disable-downtime --prefix=/proc/heaven && cd god && make && make install
Not normally, most browsers and such are smart enough to go to the next IP.
If the master server is down, the slaves still serve DNS requests.
Go with ColoCrossing ? lol
ColoCrossing is a myth.
If I had an actual business need for 100%, anycast with VRRP/CARP on the local side + some sort of load balancing would probably be the way to go.
pfsync with pf if a firewall is required locally, et al.
Even with failover setup, etc. there is still some technical time needed for the switchover to happen. So zero downtime is not possible, but achieving small maximum downtime (a couple of minutes or so) should be possible.
Depends on what kind of switchover you're talking about.
DNS based ones, absolutely. Routing based ones where multiple routes existed to multiple fully redundant gateways? Nope.
@Wintereise even if you anycast it and withdraw the routes for one of the locations from BGP, there is still some time needed for the routes to reconverge. BGP updates are not instant.
The point is that routes should never have to be withdrawn, so reconvergence should never happen -- period.
This part is easily accomplishable with redundant edge/border routing and some sort of redundancy protocol like HSRP/VRRP/CARP. Use diverse fiber paths, if possible different buildings to house core gear and you're very likely to be on the way to a full 100%.
Now, of course, if a Tornado knocks down a whole city, then you're gonna have to deal with reconvergence -- but the likelihood of that happening is little to nil on most dcs, and in those cases, I think a few seconds to reconverge is perfectly acceptable.
In previous work we had 2 DC's (small ish) side by side with fiber between them for the SAN <> SAN replication (these were $250,000 EMC SANS) with, cross site clusters that could take something silly like an 8 in 10 failure before service was impacted etc.
This was a money is no object project, probably cost about $15,000,000 for file storage/ app hosting and email that simply could not go down.
It was based on an island, the amount of cables ran with different carriers was insane.
Sadly about 2 weeks after it was signed off as complete the whole island lost power, due to an offshore disaster, after 4 days of people literally running backwards and forwards night and day with diesel for the generators a fire broke out and the whole thing completely failed.
So nothing is 100% because something will always happen, it is fair to say it is achievable but only with good luck
That is why I like lowendspirit, €13.00 p/year for 5 x servers in different countries/ DC's with HAproxy for fail-over so even if 4/5 DC's blew up at the same time you would still stay up
In the above example had they taken the initial advice and replicated it with the mainland this would not have happened but I think it was an extra $7,000,000 for the cross connect run under the water.
Even if you go to extremes for redundancies at your end, your upstreams will fail you sooner or later. They all do services affecting maintenances some times. And they all screw up without any warning sometimes. I've seen this from all kind of carriers - both premium like Level3 or shitty like Cogent. So reconvergence will be necessary sometimes. It is unavoidable. And since IP is a "best effort" protocol - it's OK too, you just have to accept it. You can minimize it, but can never 100% guarantee that it won't happen.
Correct, all I'm trying to say is that it's possible to minimize that window down to near zero levels.
A downtime that nobody noticed isn't a downtime, etc
@Blanoz:
http://lowendtalk.com/discussion/comment/647333/#Comment_647333
http://lowendtalk.com/discussion/comment/647405/#Comment_647405
I'd definately use proxies and go multisite, will always beat a single location no matter how good the hardware/UPS/generators/etc.