BuyVM: All data lost on nodes 27, 41, 52, 58

klikli · May 2012

Just got the following email from them:

On Friday at 11:10 PM GMT -8 we experienced a catastrophic failure of a power strip, causing near fatal damage to node 27, 41, 52, & 58's file systems.

>

When the power strip failed, our RAID batteries were damaged causing all data stored in our write-back caches to be lost. We worked quickly with our datacenter to diagnose the issue as we originally thought it was a network failure or a configuration issue. Once we knew the severity of the issue we had our datacenter replace all of our RAID batteries as well as multiple failed hard drives also damaged.

>

At this time we're awaiting node27 to finish its long FSCK. Nodes 41 & 58 have both returned with severe damage to the file system.

>

We are quickly working to provision you a fresh VPS with your stored IP information. If you're able to rebuild your VPS with minimal disruption, we would greatly appreciate it if you were able to just continue with a fresh VM as we work to try to recover what we can for other users without any backups.

>

With that being said, we want to remind everyone that we do have free backup space available under our BuyVM+ product. This product currently offers 5GB of space at no additional cost as well as free DNS hosting.

Although I do not store anything critical on my BuyVM VPS, it would still take time for me to re-configure stuffs to make it re-functional again:(

FRCorey · May 2012

Never heard of PDU's going out like that. @buyvm who made the strips?

jar · May 2012

It sounds like more than a few people had a rough night. Here's to hoping everyone had backups.

NateN34 · May 2012

Wow, that really sucks.

And to think I almost used the VPS for very important data...thank god I did not.

cedric · May 2012

@NateN34 said: And to think I almost used the VPS for very important data...thank god I did not.

This is why you have a backup system of some sorts. Hardware failures can happen quite unexpectedly.

Infinity · May 2012

Human error possibly, the only time a power strip has failed on me is when my uncle cplaced a diode in it killing everything connected to it including my Q6600 mobo + psu. That was the worst 'prank' ever, but there is a such thing as bad power strips..

http://m.youtube.com/?dc=organic&source=mog#/watch?v=7psPwpZWoW0

rds100 · May 2012

Shit happens. Keep backups.

Jacob · May 2012

Moral of the story is use software raid, Luckily my node was not affected but regardless I am syncing hourly backups now.

Unfortunate, People in the IRC was also saying the storage VPSs was not working, The node was showing online.

miTgiB · May 2012

@Jacob said: I am syncing hourly backups now.

That is just overkikll, while nobody likes these events, they do happen, but it is no reason to panic if you have a reasonable backup strategy in place. Maybe you can just run replication to a remote MySQL server, as if everyone were to kneejerk to hourly backups, how much will that degrade node performance

Aldryic · May 2012

Just the four nodes mentioned, Liam. We're doing our best to put the pieces back together from this... episode.

manma · May 2012

Looks like I have a new empty VPS. I guess this doesn't bode well for my data...

lbft · May 2012

@Jacob said: Unfortunate, People in the IRC was also saying the storage VPSs was not working

I was probably one of the people you saw on IRC. Storage had downtime with AFAIK no data loss for what I assume was an unrelated issue (it probably would've been up sooner if they weren't already battling major problems).

I also happen to have VPSes on two of the dead nodes (lucky me! 4 out of 60-something nodes die and I'm on two of them). I keep backups of my BuyVM stuff elsewhere because having your sole backup in the same place is a terrible idea - there are any number of failure/disaster scenarios that could take out an entire datacentre.

@Jacob said: Moral of the story is use software raid

I would imagine that more data has been lost to software raid bugs/config issues than freak hardware asplosions.

Edit: I had to open my bloody mouth and tempt fate, maybe 5 minutes after I posted this my storage crapped itself.

pcan · May 2012

The RAID writeback cache battery has reclaimed another victim, it seems. That reminds me a big IBM server I managed in the late '90s. This expensive mission-critical machine had a impressive look, multiple redundancies and failovers. It had many processors and a huge proprietary SCSI RAID card. Over time it was repurposed to a less mission critical role and the maintenance contract was cancelled. Shortly after that, the server simply turned off by itself. Power switch had no effect. This was strange, the machine had 3 power supplies and a impressive service panel full of diagnostic LEDs. They were all off. With no maintenance contract and no hope to cost-effective repair, I dismantled the cover and started taking out the reduntant components to find the culprit. One of the battery packs of the RAID card had a short circuit. The short circuit triggered the protection switch of the PCI backplane, that turned off the diagnostic card, that turned off the power supply control module. After unplugging this battery pack, the server restarted and booted fine (using the other battery).

Well, It seems that later today I will have a VPS to restore from backup...

manma · May 2012

So is it safe to say my data is lost forever and I'll need to rebuild from a week old backup? I just need to know whether to feel hopeful or devastated.

Doublejr · May 2012

If you're able to rebuild your VPS with minimal disruption, we would greatly appreciate it if you were able to just continue with a fresh VM as we work to try to recover what we can for other users without any backups.

Looks like they are trying to recover for people with no backups.

CVPS_Chris · May 2012

They are human, holy shit!

Anyway, hope it gets better for you guys, I can only imagine. What kind if PDU was it?

manma · May 2012

So, how would i got about finding out if there's a chance of my data getting recovered? Would it be wise to just leave my VPS alone for the day?

Francisco · May 2012

@CVPS_Chris said: Anyway, hope it gets better for you guys, I can only imagine. What kind if PDU was it?

It was supposed to be some decent 10 port trippelite but i get the feeling it was something cheapo =\

We're still doing FSCK's where we can

The data is still around just in really bad shape. Anywhere from 10GB to 30GB in lost+found.

Since they were all 128MB's i'm hoping most people are just vpn's and simple things they can easily rebuild. For anyone that's needing data just let us know what folders to check and we'll do our best as the nodes return.

Francisco

Francisco · May 2012

Mass remakes are done so I ask you to please log a ticket and let us know:

if you're needing us to hunt for data
if you do, what data to hunt for

Francisco

Francisco · May 2012

@manma said: So, how would i got about finding out if there's a chance of my data getting recovered? Would it be wise to just leave my VPS alone for the day?

Assume we can't.

If your VPS is just a config thing and you're out some time then I ask that you please rebuild and let us hunt for people that didn't keep backups of their lifes work.

Francisco

Francisco · May 2012

@pcan said: Well, It seems that later today I will have a VPS to restore from backup...

All but node58 have been provisioned to new gear, i'm just working on node58 right now.

Francisco

HerrMaulwurf · May 2012

Hardware failures can occur eberytime even to the best Provider. Thumbs up to Francisco and team for the way they're dealing with this issue.

manma · May 2012

@Francisco said: Assume we can't.

If your VPS is just a config thing and you're out some time then I ask that you please >rebuild and let us hunt for people that didn't keep backups of their lifes work.

Francisco

Sadly I didn't keep backups of my life's work. My fault, I know, but I wouldn't blame any of you if I couldn't get anything back. You're still my #1 VPS provider despite all of this. I've gone ahead and logged a ticket with the most important directory, and I was instructed to wait until bzImage made an announcement regarding the FSCKs finishing, so I'll just be patient until then.

Thanks for being so transparent about all of this. Its amazing how an unmanaged VPS provider does more for its customers than a managed shared host would.

BlueVM · May 2012

Sorry to hear your having a bad day/night. Let us know if there is anything we can do to help you out...

Francisco · May 2012

@manma said: Sadly I didn't keep backups of my life's work. My fault, I know, but I wouldn't blame any of you if I couldn't get anything back. You're still my #1 VPS provider despite all of this.

We'll do our best to hunt for things, just give a file list to Anthony.

I was positive we mentioned buyvm+ in one of our company emails a while back and I wish we had even more people using it (around ~1000 people do right now).

As I mentioned, we'll be increasing space on the offering so people can store a lot more.

Francisco

Mon5t3r · May 2012

thankfully i didn't received any email..

Francisco · May 2012

@BlueVM said: Sorry to hear your having a bad day/night. Let us know if there is anything we can do to help you out...

Anthony is pulling NFS mounts off everything but node27 right now to start salvaging what he can. Some of the boxes are booted on live CD's just because the HN got smacked around. I mean, we were missing half of our kernel on the box and it was on a different partition all together o_O

Francisco

Asim · May 2012

speaking of backups, I just noticed that my backup vps is also down since hours

Francisco · May 2012

@Asim said: speaking of backups, I just noticed that my backup vps is also down since hours

Yea, unrelated, i'm just seeing what's acting up on that one.

Francisco

Asim · May 2012

@Francisco said: Yea, unrelated, i'm just seeing what's acting up on that one.

Thanks, I have already created a ticket for the same.

miTgiB · May 2012

@Asim said: I just noticed that my backup vps is also down

Stop selling off all your servers, how can you even keep tabs on that merry go round

Howdy, Stranger!

Categories

In this Discussion

BuyVM: All data lost on nodes 27, 41, 52, 58

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

BuyVM: All data lost on nodes 27, 41, 52, 58

Comments