Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


BuyVM Catastrophic Data Failure - All data lost on a node! - Page 5
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

BuyVM Catastrophic Data Failure - All data lost on a node!

12357

Comments

  • look like lv-shared03 gone bye bye :D

  • risharderisharde Patron Provider, Veteran

    Sorry to hear about this, this reminds me of a post I made a while back and got banned when my temper flared at an admin. People think I'm crazy when I recommend or defend mirroring (RAID 1) as opposed to RAID 10. I'm not naive though, things can happen on RAID1 but so far, I've never had a problem knocks on wood. Plus it's just easier for me to restore a drive if something happens.

  • HarambeHarambe Member, Host Rep
    edited April 2018

    @robohost said:
    @Francisco any trouble with lv-shared03?

    @quickuse said:
    look like lv-shared03 gone bye bye :D

    Wha? No issues here on 03.

    Box is still up, looks like a Litespeed crash maybe?

  • Double check please IP pinging none of the site are working

  • @Francisco said:

    @sidewinder said:
    I thought SMART was supposed to warn of disk failures? Much respect to BuyVM ... Hope it works out

    The drives weren't a failing in any fashion.

    I don't think smart would spot a controller going nuts either.

    I'll be at the DC in an hour or so, I'll try the trick then.

    Francisco

    how about lv-shared03 all my domains can not be accessed, httpd failed

  • ClouviderClouvider Member, Patron Provider

    @cangkirhost said:

    @Francisco said:

    @sidewinder said:
    I thought SMART was supposed to warn of disk failures? Much respect to BuyVM ... Hope it works out

    The drives weren't a failing in any fashion.

    I don't think smart would spot a controller going nuts either.

    I'll be at the DC in an hour or so, I'll try the trick then.

    Francisco

    how about lv-shared03 all my domains can not be accessed, httpd failed

    I’m sure you created a ticket about it ?

  • Clouvider said:

    I’m sure you created a ticket about it ?

    The email from buyvm asked people to not open tickets unless they're still having problems after buyvm says the services are back up. Right now stuff is already known to be busted, so opening tickets doesn't convey any new info.

  • ClouviderClouvider Member, Patron Provider

    @willie said:

    Clouvider said:

    I’m sure you created a ticket about it ?

    The email from buyvm asked people to not open tickets unless they're still having problems after buyvm says the services are back up. Right now stuff is already known to be busted, so opening tickets doesn't convey any new info.

    Common sense says that if there’s a new problem with a different server you should open a new ticket.

    Thanked by 1Ewok
  • nepsneps Member

    cangkirhost said: how about lv-shared03 all my domains can not be accessed, httpd failed

    lv-shared03 is working fine for me as of now.

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    Yeah, sorry on lv-shared03. BuyCPanel had my licenses reversed (shared03's license on 04, 04's on 03) so when I asked them to release my license to reinstall 04...it reset 03's instead.

    I was in the datacenter tonight working on storage as well as prepping for the future shared upgrades and couldn't hear my alerts going off.

    Anyway, my fault on that. I worked with BuyCPanel to fix the discrepancy in the IP's and such so that headache won't happen again.

    As of now we're at 90% restored. I'm syncing over the last batch of accounts and Anthony will get that rolling in the next little bit.

    Overall around 24 hours to get everyone sorted. If we were using differential archives instead of the "rsnapshot hardlink folders" we could've shaved quite a few more hours off that i'm sure.

    Francisco

  • TomTom Member

    @Francisco did you try that "one weird trick" that someone posted above?

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @Tom said:
    @Francisco did you try that "one weird trick" that someone posted above?

    We did!

    It didn't work, at least not yet :(

    We're going to try it again to see if we maybe need to do it longer or what have you. Going to try loading Samsungs Magician stuff and see if that can maybe fix it.

    Francisco

    Thanked by 1Tom
  • ClouviderClouvider Member, Patron Provider

    @Tom said:
    @Francisco did you try that "one weird trick" that someone posted above?

    Yeah, if this works I’ll be surprised much

    Thanked by 2Tom Ewok
  • FranciscoFrancisco Top Host, Host Rep, Veteran

    Clouvider said: Yeah, if this works I’ll be surprised much

    We brought the drives back to the house so we'll run them for longer.

    I'm not putting much hopes on it, i'll simply move forward with the changes I discussed earlier. It'll make everyone happy.

    Francisco

    Thanked by 2Clouvider FHR
  • datanoisedatanoise Member
    edited April 2018

    Francisco said: I'm not going to rush to blame TLC

    2D TLC might have it's own problems (theoretical shorter lifespan compared to MLC) but 3D NAND, be it MLC, TLC or SLC will let you down if there is trouble with one of the die, even if most of the cells are good. While with 2D NAND you had some time to realize your drive was dying - if it was build around a good controller. 3D NAND is interesting: 32 layers on a chip is good for cost reduction, but that doesn't allow RAID 5 or RAID 6 style data allocation (chip level) as some controller can do. I guess they could still do that at the layer level. Not sure if it's done or not by most modern controllers, probably not (?).

    If the controller burns, your screwed either way; could be what happened to you: would be pretty interesting to know what component went wrong in your SSDs!

    Congrats for quickly fixing the problem though!

  • @eric1212 said:
    BuyVM will also be restoring their backup, but it's a few days old and unsure if it contains everything.

    Was this on the buyshared or buyvm brand? I supose it was on buyshared and not buyvm, since buyvm does not offer backups.

    If it was on the buyshared brand, they advertise on their website that all plans have "Nightly Backups". So how can the backup they have, be some days old?

  • ClouviderClouvider Member, Patron Provider

    @nqservices said:
    If it was on the buyshared brand, they advertise on their website that all plans have "Nightly Backups". So how can the backup they have, be some days old?

    Some backups could have failed, for example.

  • FranciscoFrancisco Top Host, Host Rep, Veteran
    edited April 2018

    nqservices said: Was this on the buyshared or buyvm brand? I supose it was on buyshared and not buyvm, since buyvm does not offer backups.

    If it was on the buyshared brand, they advertise on their website that all plans have "Nightly Backups". So how can the backup they have, be some days old?

    BuyVM in Vegas has some backups, we'll be offering full nightly backups on slices sometime in the summer.

    The nightly was correct when we did r1soft but since that kept breaking we use JetBackup which can sometimes take > 1 day to generate backups making the backups not line up properly.

    Sometimes Jet doesn't generate a backup for some users even though it should've as well.

    Francisco

  • eric1212eric1212 Member
    edited April 2018

    Francisco said: The nightly was correct when we did r1soft

    Ya my data restored was from Sunday :(
    But hey, at least it was restored by them and saved me from having to deploy my backup which was also outdated.

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    datanoise said: Congrats for quickly fixing the problem though!

    Far as I know we've restored every account we have any backups of. Anthony will be auditing to make sure there's no backups from terminated users restored, but overall a job well done on his part.

    Francisco

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    eric1212 said: Ya my data restored was from Sunday :( But hey, at least it was restored by them and saved me from having to deploy my backup which was also outdated.

    Backups fire on Monday, Wednesday, and Friday, but yeah, sometimes backups can drift for a long time.

    The node we have storing everything is RAID60 and so the amount of IOPs it has available is pretty damn low.

    Our plan right now is to do local RAID10's and return things back to Nightly again. I'll also use the stream/block level deduplication system we have for BuyVM to generate weekly off-node disaster recovery images.

    As the month goes on we'll be scheduling an hour or two downtime on the other shared nodes in Vegas and rebuild them into exact matching configurations as well. Since I can just straight DD the data off the current arrays onto the NVME's it'll go quick.

    Francisco

    Thanked by 1eric1212
  • Check that your web server domain configs is up to date. I see your cpanel recognises domains that I had previous, but the front end web server isn't up to speed on them.

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    ricardo said: Check that your web server domain configs is up to date. I see your cpanel recognises domains that I had previous, but the front end web server isn't up to speed on them.

    That might've been a configuration setting. By default cpanel doesn't allow you to have addon domains pointing outside of public_html but on the previous node we allowed that (that way you could have your addon domains in your root folder).

    As the imports ran it broke a lot of addon domains since it repointed them to public_html instead. You can just go to Addon Domains to change the path's.

    Francisco

    Thanked by 1Saragoldfarb
  • @Francisco said:
    Since I can just straight DD the data off the current arrays onto the NVME's it'll go quick.

    You really want someone else to look over your shoulder so you don't mess up target and source then.

  • Francisco said: might've been a configuration setting

    Yes, apparently it was. Thanks for the hint.

  • deankdeank Member, Troll

    @Francisco LET community will be waiting for you to publish a book titled "A little messier than I thought".

  • FranciscoFrancisco Top Host, Host Rep, Veteran
    edited April 2018

    deank said: @Francisco LET community will be waiting for you to publish a book titled "A little messier than I thought".

    "Awww fuck."

    "A guide to webhosting"

    Francisco

  • Neat. Are these weekly backups made on LUX-shared nodes too? that would make me smile.

    I'm just too lazy to backup friends n family shared hosting.

  • ZerpyZerpy Member

    How is jetbackup configured if it can take more than a day to perform a backup of a server? o.O Even with a raid 6 array that I have for backups, 400 accounts take only a few hours and is mostly due to some accounts having 2+ million files :')

  • imokimok Member

    What? Mine takes a couple of hours to backup up around 15 accounts with 60GB in total. I have to check it.

Sign In or Register to comment.