New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
Unfortunately Reboot does not resolve my issue.
OVH support is very slow
This time you could not expect anything else, nobody can have enough capacity for such issues in the support area, will just get hundreds of people twiddling their thumbs, 99.5% of the time.
I forgot which one, but during one of the "fiber cuts" they had all of BHS routing through like a single 1G backup line or something, so everyone got to collectively stare at their terminals, unable to actually do anything just because of how slow it was. BHS was fun.
I don't think their DR department is really up to the task? Why?
"power outage" is "The worst scenario that can happen to us".
No, that isn't. Let's change that scenario to situation:
1) Over voltage, all hardware fried
2) Flood, all hardware flushed off, or massive fire
3) Direct air cooling and nice amount of vulcanic ash or something, like corrosive chemical leak
4) Fertilizer ship / train going past the DC explodes
5) Solar flare / EMP - Widely fried electronics. - Some data centers are hardened against this
6) Nation state (or any other competent party) gets pissed of at them, and wipes their systems totally hijacking all control of the systems, after monitoring operations for months and they really know what they're doing
7) Any of these scenarios caused by internal sabotage
In these scenarios the whole site is more or less physically wiped out. Yep, recovering that is big more demanding than the 'trivial' job of restoring power.
These are the reason why I'm always having full off-site remote backups, just in case. Because you'll never know.
@WebDude Holy shit- do you write copy for CNN, or did you just produce end-of-the-world movies for basic cable during the 90s?
Yesterday I restored servers what suffered in the accident. Only with Centos 7 there were problems, all the servers of Debian 7, 8, 9 are 100% up and did not even notice the jambs. Here it is the power of the Debian. The four Centos servers went to rescue. I had to transfer the data to backup lftp, reinstall it, configure the software and then back it up to 1 gigabit speed. Spent about 2 hours to work.
All other orders in SBG stood up normally VPS and other. PING UP.
The accident created losses at 21000 RUB ~ 305 EUR.
Yeah, you made me laugh. But all of those things have happened in past, and will happen in future too.
Edit: Have you checked the SBG sites location? I think it's not optimal. Flood can be very real risk. Don't have data, but it looks like it.
I don't remember the last time someone intentionally blew up a datacenter, myself. Thankfully.
If it was NetBSD, you could still be fscking them (until you ran out of swap)!
At least some one talking reasonable numbers and not that usual thousands of dollars per minute jibberish.
Well, costs can be indirect. It's hard to even estimate costs. If it's downtime alone, it might not be that expensive. But if the situation would have been worse. And systems would have needed to be restored from off-site backups. Yes, it would have been several thousands of euros directly. And even more indirectly, when users require compensation, and data is lost and there's extra data synchronization, and restoring data lost due to restoring potentially day old backup, and so on.
In that situation we're probably talking over 10k€+ easily. It would have meant basically redirecting all resources on system restoration (probably on other service provider) and ... Lots of work, before everything is working. Probably one week before most important stuff is working and before everything is working, would have taken around one month.
Yes, of course everyone has considered these things when making their DRP. Good thing. Stuff can be restored. Bad thing. It would take a lot of time and cost a lot, and probably cause indirect costs in loss of customers and tons of bad will, and so on.
There's also some stuff which is considered not worth of backing up daily to off-site location, because that data isn't "critical". But it's still something which would be still essentially recreated in case of total loss of DC & storage.
From some of the posts, it seems that the users / clients hasn't made proper DR preparations. - Providers like UpCloud clearly state that the clients are required to have off-site backups of all critical data.
Also if uptime is so important, in this kind of situation. As soon as the issue is detected, there restore to another provider / location should be launched immediately. Preferably in fully automated fashion. Or even there should be already alternate replicated sites, where your systems can fail-over automatically. - These are the discussions which pop-up always when there's issue with Amazon. If the service is important, you shouldn't trust only one Availability Zone or Region nor you should trust even one Cloud-Provider. - These are the topics I'm always bringing up, when someone says that they'll need system with high availability.
Read more about WTC and Data Centers. There were serious issues with banks. Also because someone smart (?), thought that having the second hot replica of data / backup systems in the alternate tower was a good idea. - Well that failed. And they had to do the slow and painful off-site restoration process.
Good example why the secondary system shouldn't be too close to the primary system. Of course that could cause latency issues, if it's far enough. - That's why we got Google Spanner and similar kind of technologies, for data which is actually important.
Btw. I've got $25 free credit invites to UpCloud, if anyone wants to take a look. I'm using it for everything, which is too important to be handled by OVH. - So I kind of agree with others, that the OVH is the budged solution.
Edit: typofix date -> data
This is a very sensible thing to say. Not just to cover your ass, but also because there are so many other things that can go wrong.
We say we keep backups on some services and people constantly ask us to restore them because they deleted the wrong folder or because they got hacked.
We keep backups of the whole storage, that is meant for bare metal recovery, not individual containers, it covers hardware failure, mostly, there is no way we can know what data the customer needs to back-up and critically, when the snapshot must be taken. Also, we take those once a week or once a day and keep only one set. In the first case the back-up can be too old, in the second it can be already overwritten when you notice the malware/hack, etc. Heck, can be overwritten in the first case too.
NOBODY should presume the provider will cover their ass, nor that they will do it sensibly taking care of all their specifics, if your data is important, you take backups, if your data is critical, you sync it AND back it up in different locations, if also uptime is critical you make some sort of redundant setup (which will also kinda cover the back-up issue, except the cold storage of incremental back-ups so you can go back if needed for various other cases).
Out of interest has anyone still got services out.
one of my VPS in SBG appears to still be down i have tried rebooting it but it hangs on 25% then errors out.
Since you called me out here, Just to follow up @bitswitch, some services continue to be down on the 3rd day. I wasn't a keyboard warrior attacking OVH in my post, I was simply realistic.
AWS is ~10x more expensive all things (traffic) considered and had an identical outage which took out an availability zone. Price is a consideration but it doesn't work quite so straightforward at scale.
But I was talking opposite situation here? You can have large mark up and screw it up, but when you need to cut corners to not sell at a huge loss you have significantly higher chances of screwing up.
Like Oles of OVH admitted here
Now as I said, this is a result of cutting corners.
OVH claims it was a problem with power failover. Generators worked but the transfer switch didn't. It's obviously difficult to test and also hit one of the world's largest and most expensive providers.
How is it a result of cutting corners?
Sounds like propaganda to buy his services instead of OVH.
Did you ever thought about size? SBG 4 could be twice the size as SBG 1.
You're disagreeing with me in principle ? Did you even bother to read what I linked you to ?
Let me quote the relevant part:
As per Oles
Not even worth time to discuss then ;-).
You also write something dramatically false. As per Oles, they did not work. Please DO READ before you start contesting someone's posts.
I get it. It's part legacy, part experiment. But the issue wasn't with any of that. It's not like the containers rusted through.
The outage was caused by a fault in power failover. The generators did work, they didn't start. This is a problem which affects all providers because it's fundamentally difficult to test for.
It's all in the official post mortem. You don't have to wade through Oles' frenglish feed.
By the way, Interxion España experienced the same type of outage a few years back where the generators didn't turn on during a power outage.
Faults happen - sometimes it kills a whole lot of hardware, other times not.
Deal with it, design DC/Provider failover if stuff is essential enough :-D
I'm sure that call was calm and handled easily and quickly by both parties at the time. To hear what sort of threats and such were bandied about..
Relevant for you? thats why you also cutted the line below?
Every where OVH is, you keep bashing it.
He said, SBG was planned as small DC but was scaled up fast due demand, just because there is a design flaw, and they need to close old affected parts, because of what happend.
You make the decision and tell us, its because of the demand?
It does not sound like to me because of demand, he is closing old parts yes, but rather due to the issues of SBG.
He is going to invest another 5 million into SBG, do you think he would do that if there is no demand?
Relevant to support my sentence.
Future investment is irrelevant to what I was saying. Furthermore, you might have noticed that they learned quite a bit through the years and are increasing the prices, and my hunch is they will continue to steadily do so now.
OVH had the balls to admit the failure, and I respect them for that.
I understand you are in love with them, but this love makes you blind when even they admitted it.
Yeah, and clearly keeping only 8 minutes of juice didn't allow anyone to react quick enough to prevent it. This particular outage was simple and extremely easy to prevent.
Have you ever been present during a site full failover test? You'd notice that it takes a good couple moments for the generators to start and normalise in normal conditions when everything goes well. I don't work with any datacentre that has such a short margin on batteries as it was in this case.
Btw guys, if you like stories here's related Hacker News discussion.
my VPS is finally live since the outage in SBG
My VPS was also up!
Edit: Just that it wasn't hosted at OVH