BuyVM: All data lost on nodes 27, 41, 52, 58
Just got the following email from them:
On Friday at 11:10 PM GMT -8 we experienced a catastrophic failure of a power strip, causing near fatal damage to node 27, 41, 52, & 58's file systems.
>
When the power strip failed, our RAID batteries were damaged causing all data stored in our write-back caches to be lost. We worked quickly with our datacenter to diagnose the issue as we originally thought it was a network failure or a configuration issue. Once we knew the severity of the issue we had our datacenter replace all of our RAID batteries as well as multiple failed hard drives also damaged.
>
At this time we're awaiting node27 to finish its long FSCK. Nodes 41 & 58 have both returned with severe damage to the file system.
>
We are quickly working to provision you a fresh VPS with your stored IP information. If you're able to rebuild your VPS with minimal disruption, we would greatly appreciate it if you were able to just continue with a fresh VM as we work to try to recover what we can for other users without any backups.
>
With that being said, we want to remind everyone that we do have free backup space available under our BuyVM+ product. This product currently offers 5GB of space at no additional cost as well as free DNS hosting.
Although I do not store anything critical on my BuyVM VPS, it would still take time for me to re-configure stuffs to make it re-functional again:(
Comments
Never heard of PDU's going out like that. @buyvm who made the strips?
It sounds like more than a few people had a rough night. Here's to hoping everyone had backups.
Wow, that really sucks.
And to think I almost used the VPS for very important data...thank god I did not.
This is why you have a backup system of some sorts. Hardware failures can happen quite unexpectedly.
Human error possibly, the only time a power strip has failed on me is when my uncle cplaced a diode in it killing everything connected to it including my Q6600 mobo + psu. That was the worst 'prank' ever, but there is a such thing as bad power strips..
http://m.youtube.com/?dc=organic&source=mog#/watch?v=7psPwpZWoW0
Shit happens. Keep backups.
Moral of the story is use software raid, Luckily my node was not affected but regardless I am syncing hourly backups now.
Unfortunate, People in the IRC was also saying the storage VPSs was not working, The node was showing online.
That is just overkikll, while nobody likes these events, they do happen, but it is no reason to panic if you have a reasonable backup strategy in place. Maybe you can just run replication to a remote MySQL server, as if everyone were to kneejerk to hourly backups, how much will that degrade node performance
Just the four nodes mentioned, Liam. We're doing our best to put the pieces back together from this... episode.
Looks like I have a new empty VPS. I guess this doesn't bode well for my data...
I was probably one of the people you saw on IRC. Storage had downtime with AFAIK no data loss for what I assume was an unrelated issue (it probably would've been up sooner if they weren't already battling major problems).
I also happen to have VPSes on two of the dead nodes (lucky me! 4 out of 60-something nodes die and I'm on two of them). I keep backups of my BuyVM stuff elsewhere because having your sole backup in the same place is a terrible idea - there are any number of failure/disaster scenarios that could take out an entire datacentre.
I would imagine that more data has been lost to software raid bugs/config issues than freak hardware asplosions.
Edit: I had to open my bloody mouth and tempt fate, maybe 5 minutes after I posted this my storage crapped itself.
The RAID writeback cache battery has reclaimed another victim, it seems. That reminds me a big IBM server I managed in the late '90s. This expensive mission-critical machine had a impressive look, multiple redundancies and failovers. It had many processors and a huge proprietary SCSI RAID card. Over time it was repurposed to a less mission critical role and the maintenance contract was cancelled. Shortly after that, the server simply turned off by itself. Power switch had no effect. This was strange, the machine had 3 power supplies and a impressive service panel full of diagnostic LEDs. They were all off. With no maintenance contract and no hope to cost-effective repair, I dismantled the cover and started taking out the reduntant components to find the culprit. One of the battery packs of the RAID card had a short circuit. The short circuit triggered the protection switch of the PCI backplane, that turned off the diagnostic card, that turned off the power supply control module. After unplugging this battery pack, the server restarted and booted fine (using the other battery).
Well, It seems that later today I will have a VPS to restore from backup...
So is it safe to say my data is lost forever and I'll need to rebuild from a week old backup? I just need to know whether to feel hopeful or devastated.
Looks like they are trying to recover for people with no backups.
They are human, holy shit!
Anyway, hope it gets better for you guys, I can only imagine. What kind if PDU was it?
So, how would i got about finding out if there's a chance of my data getting recovered? Would it be wise to just leave my VPS alone for the day?
It was supposed to be some decent 10 port trippelite but i get the feeling it was something cheapo =\
We're still doing FSCK's where we can
The data is still around just in really bad shape. Anywhere from 10GB to 30GB in lost+found.
Since they were all 128MB's i'm hoping most people are just vpn's and simple things they can easily rebuild. For anyone that's needing data just let us know what folders to check and we'll do our best as the nodes return.
Francisco
Mass remakes are done so I ask you to please log a ticket and let us know:
Francisco
Assume we can't.
If your VPS is just a config thing and you're out some time then I ask that you please rebuild and let us hunt for people that didn't keep backups of their lifes work.
Francisco
All but node58 have been provisioned to new gear, i'm just working on node58 right now.
Francisco
Hardware failures can occur eberytime even to the best Provider. Thumbs up to Francisco and team for the way they're dealing with this issue.
Sadly I didn't keep backups of my life's work. My fault, I know, but I wouldn't blame any of you if I couldn't get anything back. You're still my #1 VPS provider despite all of this. I've gone ahead and logged a ticket with the most important directory, and I was instructed to wait until bzImage made an announcement regarding the FSCKs finishing, so I'll just be patient until then.
Thanks for being so transparent about all of this. Its amazing how an unmanaged VPS provider does more for its customers than a managed shared host would.
Sorry to hear your having a bad day/night. Let us know if there is anything we can do to help you out...
We'll do our best to hunt for things, just give a file list to Anthony.
I was positive we mentioned buyvm+ in one of our company emails a while back and I wish we had even more people using it (around ~1000 people do right now).
As I mentioned, we'll be increasing space on the offering so people can store a lot more.
Francisco
thankfully i didn't received any email..
Anthony is pulling NFS mounts off everything but node27 right now to start salvaging what he can. Some of the boxes are booted on live CD's just because the HN got smacked around. I mean, we were missing half of our kernel on the box and it was on a different partition all together o_O
Francisco
speaking of backups, I just noticed that my backup vps is also down since hours
Yea, unrelated, i'm just seeing what's acting up on that one.
Francisco
Thanks, I have already created a ticket for the same.
Stop selling off all your servers, how can you even keep tabs on that merry go round