Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


racknerd outage-discussion
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

racknerd outage-discussion

Hello,

You are receiving this email because you have a VPS with us located on the LAXSSD4014nerd6DC02 node. Earlier today, at approximately 5:46 PM PST our technicians noticed this node was experiencing symptoms. Upon troubleshooting further we learned that despite us taking multiple redundancy measures, including having 8x SSD’s in a RAID-10 on the host node, this node suffered a RAID array corruption issue. Since then, we have been tirelessly working to attempt to recover the RAID array and to bring this node back to functional order.

We simply wanted to provide you with an update and let you know we’re working on this. Our senior systems administrators are still working to attempt to salvage the RAID array. We will need some more time to continue to work on this issue.

We will be providing affected customers on this node with compensation. We will determine this, and email you again in the coming days with the details of the amount of compensation / added service time we will be providing.

We deeply regret any disruption this incident may have caused and want to reassure you of its exceptional rarity. Considering the vast number of physical VPS nodes we've maintained in production over the years, spanning multiple data centers, incidents such as these are extremely unusual. The resilience and dependability of our infrastructure remain a primary focus.

Your understanding in this matter is greatly appreciated. We will make it a priority to keep you updated, and we will be updating the status incident link with more information as developments emerge: https://status.racknerd.com/incident/1184

Thank you for your patience during this time.

Comments

  • i just wanted to bring to attention, not blaming racknerd for having this, shit happens. i asked in a ticket about the same and they pointed to the issue tracker, fine and when i asked about if there is a chance of data loss, they said that its not but cannot guarantee. that brings me to the second comment

  • why doesn't racknerd provide something like a paid system image backup like many providers?

    secondly, people who use racknerd or racknerd like providers who do not have a native backup in place, how do you manage backups?

    i can, uh set up a backup server or a mirror but the problem would remain that i have to buy a second vps and manage that.

    if i do backblaze, then it has to be tested and paid for so.......... lowned ideas?

    again, i am a happy racknerd customer since 2021 i think and this is the first time i got some trouble like this, i completely understand that you get what you pay for and all that.

  • having the initiative to send an incident report on their own incident tracker is a significant advantage for a low-end provider. usually you just get ihostart'd, virmach'd, or worse hostdoc'd & cociu'd

    if they dont have backup solution then you have to overcome it on your own. sounds like a skill issue for me

  • Seems like the perfect response to me. Acknowledged the issue, warned of potential data loss, provided information on the technical issue and offered compensation for those affected.

    This would be a good time to double check your latest backups and prepare to restore them.

    Data loss sucks and restoring backups is a pain in the ass, but this is the moment we're all prepared for.

  • JasonMJasonM Member

    @dahartigan said: Data loss sucks and restoring backups is a pain in the ass, but this is the moment we're all prepared for.

    RackNerd customer here. They should add a feature of adding paid snapshots/backup for peace of mind. Restoring from remote server sucks. A button in control panel to restore (like we have in Jetbackup) will be beneficial for every hosting companies customers!

    Thanked by 1inthecloudblog
  • dustincdustinc Member, Patron Provider, Top Host
    edited July 2023

    Hi @john_sd3 -- Happy 4th, and Thank You for your continued business and support over the years. I must say that time certainly flies - we have indeed grown a lot since 2021 and we appreciate you sticking around.

    As you pointed out, we do leverage our status page at https://status.racknerd.com/ to keep updates streamlined, this method has worked well for our clients and us for more than the past few years and counting. In a transparent manner, it allows us to communicate effectively with our customers, providing necessary updates without diverting/taking away substantial manpower from working on issues. Our team stays diligent in reporting any known problems there, thus allowing customers to ascertain whether or not an issue they may be seeing originates from our end. This status page has proven particularly useful for quickly determining/ruling out source of issues, especially for our clients operating within unmanaged environments.

    With regards to the recent hiccup, one of our KVM VPS nodes in Los Angeles, LAXSSD4014nerd6DC02, is currently experiencing issues related to the RAID controller. Our team is actively working on the matter, alongside with the datacenter remote hands team. By the way, all of our host nodes run on RAID-10 arrays for redundancy, though in the spirit of best practices, we agree with the time-honored wisdom: RAID is not a backup substitute. As such, we strongly encourage our clients to ensure they have their own backups in place (again, this advice is mentioned solely in the spirit of best practices).

    A cost-effective approach we notice commonly used by many of our VPS customers involves setting up cron jobs or scripts to back up any important files/directories to a different VPS (ideally to another VPS located in a different physical datacenter location - whether that be a secondary VPS with us, or elsewhere). If your data footprint is not massive, a free approach could be leveraging cloud storage services such as Google Drive or Dropbox that include a free tier of storage (in which case, you could set certain directories or files to back up automatically within your VPS to said cloud service). Please note that each client's scenario and use case is unique, and these are merely examples of potential strategies you could employ for handling backups. There are dozens of other possibilities/paths you could take for handling backups, and we can't necessarily say one backup method is better than the other - as it ultimately depends on what is best for your use-case.

    On the subject of paid snapshot support, we are looking forward to incorporate this feature upon transitioning to SolusVM V2. Unfortunately, at this time we cannot commit to a timeline for migrating from SolusVM V1 to SolusVM V2, given the premature state of the product and the less-than-ideal functionality of their released import tool, as we learned through our tests. We appreciate your patience and understanding as we continue to work with the SolusVM team on ironing out its kinks and identifying the best upgrade path for us.

  • @JasonM said:

    @dahartigan said: Data loss sucks and restoring backups is a pain in the ass, but this is the moment we're all prepared for.

    RackNerd customer here. They should add a feature of adding paid snapshots/backup for peace of mind. Restoring from remote server sucks. A button in control panel to restore (like we have in Jetbackup) will be beneficial for every hosting companies customers!

    I'm not opposed to the idea, but I personally think the backups should be offsite and not at the provider level. Imagine a provider deadpools and your backups are gone with the wind.

  • rcy026rcy026 Member

    @john_sd3 said:
    why doesn't racknerd provide something like a paid system image backup like many providers?

    secondly, people who use racknerd or racknerd like providers who do not have a native backup in place, how do you manage backups?

    People use Restic, Borg, Duplicati, Kopia, rsync or any of hundreds of other free solutions.
    If you are willing to pay there are things like Acronis, Veeam or thousands of cloud drive clients.

    i can, uh set up a backup server or a mirror but the problem would remain that i have to buy a second vps and manage that.

    Yes. How much is the data worth to you?

    if i do backblaze, then it has to be tested and paid for so.......... lowned ideas?

    No matter what you chose, it needs to be tested. And most likely paid for.

    again, i am a happy racknerd customer since 2021 i think and this is the first time i got some trouble like this, i completely understand that you get what you pay for and all that.

    I am also a happy Racknerd customer, and this really has nothing to do with them.
    Keeping backup of important data is your personal responsibility. No offense to Racknerd and @dustinc, but even if they did provide some kind of backup service I would not use it. Keeping the backup of the data in the same location as the actual data is inherently stupid, I do not do that even with the machines I have at 365 and AWS.

    My personal recommendation to you is to get some kind of storage somewhere, choose a solution and spend a few hours getting familiar with it. How to run Restic or Borg from a script should not take long to figure out, and a decent lifetime deal on Stacksocial should get you enough storage for a very low cost.

  • risharderisharde Patron Provider, Veteran

    I've been saying it for years and even the majority of LET users have laughed at me. RAID1 with hotswap and or backup is more reliable but doesn't seem to sell - I offered that and never made my deals get off the ground.

    Thanked by 1crunchbits
  • angstromangstrom Moderator

    We recommend to activate your Disaster Recovery Plan.

    Thanked by 2equalz PineappleM
  • rcy026rcy026 Member

    @risharde said:
    I've been saying it for years and even the majority of LET users have laughed at me. RAID1 with hotswap and or backup is more reliable but doesn't seem to sell - I offered that and never made my deals get off the ground.

    Reliable sure, but performance, scalability and percentage of disk wasted doesn't exactly make it favorable for providers.

  • @dahartigan said:

    @JasonM said:
    RackNerd customer here. They should add a feature of adding paid snapshots/backup for peace of mind. Restoring from remote server sucks. A button in control panel to restore (like we have in Jetbackup) will be beneficial for every hosting companies customers!

    I'm not opposed to the idea, but I personally think the backups should be offsite and not at the provider level. Imagine a provider deadpools and your backups are gone with the wind.

    Yep. Backups at the provider can be great, it means that you don't have the hassle of restoring when something goes wrong at that level as the provider can go back to last-known-good (which is hopefully recent enough), but off-provider backups can rescue you from families of problems on-provider backups can't. Unfortunately neither are free – someone is paying in terms of storage resources and admin/monitoring. Both is best (for the convenience & speed when provider backups do help, and safety for when they don't) but that of course costs twice.

    @risharde said: I've been saying it for years and even the majority of LET users have laughed at me. RAID1 with hotswap and or backup is more reliable but doesn't seem to sell - I offered that and never made my deals get off the ground.

    Laughing at you seems a bit string, but you are talking about the low end of the market – if it costs extra money it won't sell well down here. People often only care about the risks of going cheap when the dangers hit, at which point it is usually too late and the provider gets the joyful task of placating angry users who either didn't want to pay for redundancy but expected the provider to have it anyway and angry users who did no research and had no idea (we all remember some of the entitled moaning when one of OVH's DCs went up in smoke). This is the nature of low-end.

    Thanked by 1risharde
  • why rely on the same provider for backup? even OVH couldn't save themselves.

  • @cybertech said:
    why rely on the same provider for backup? even OVH couldn't save themselves.

    so you are saying we should have a mirror on a second provider. sounds cool but how would you manage the mirror, would it be as an offsite backup or as a live hot replacement or something in between.

  • emghemgh Member
    edited July 2023

    @dustinc I may have criticized RackNerd in the past, and I probably will continue to do so for as long as I'm here, healthy and not banned, however, really good email that clearly explained what have happened and RAID failures clearly isn't anything specific to RackNerd.

    Especially for the price, expecting anything more than what was communicated, and even expecting what in-fact was communicated, is ridiculous.

    Edit: What I'm trying to say, without sounding too skeptical, is that even for someone not really a big RackNerd fan, this is defintely them going above and beyond for LET standards.

    Thanked by 1crunchbits
  • emghemgh Member

    @risharde said:
    I've been saying it for years and even the majority of LET users have laughed at me. RAID1 with hotswap and or backup is more reliable but doesn't seem to sell - I offered that and never made my deals get off the ground.

    Explain for noobie pls

    Why RAID1 more reliable pls

  • @john_sd3 said:

    @cybertech said:
    why rely on the same provider for backup? even OVH couldn't save themselves.

    so you are saying we should have a mirror on a second provider. sounds cool but how would you manage the mirror, would it be as an offsite backup or as a live hot replacement or something in between.

    I recommend both things.

    The hot spare can be a relatively cheap VPS at another provider. You set it up identically to your primary, and you can switch over to it if needed. Definitely use database replication so you have a live copy of your database in more than one location.

    In addition to that, you should be storing versioned backups. You can get backup storage quite cheaply through storage VPS offers that are commonly seen here on LET.

    As for software to perform the backups, there are multiple good open source options (I like rsnapshot for its simplicity) as well as commercial options that have a free tier (Veeam, for example).

  • @aj_potc said:

    @john_sd3 said:

    @cybertech said:
    why rely on the same provider for backup? even OVH couldn't save themselves.

    so you are saying we should have a mirror on a second provider. sounds cool but how would you manage the mirror, would it be as an offsite backup or as a live hot replacement or something in between.

    I recommend both things.

    The hot spare can be a relatively cheap VPS at another provider. You set it up identically to your primary, and you can switch over to it if needed. Definitely use database replication so you have a live copy of your database in more than one location.

    In addition to that, you should be storing versioned backups. You can get backup storage quite cheaply through storage VPS offers that are commonly seen here on LET.

    As for software to perform the backups, there are multiple good open source options (I like rsnapshot for its simplicity) as well as commercial options that have a free tier (Veeam, for example).

    what he said.

  • @angstrom said:

    We recommend to activate your Disaster Recovery Plan.

    This is literally what I was thinking when reading this thread... pure gold.

    For those who didn't get the joke: https://www.datacenterdynamics.com/en/opinions/lessons-ovh-fire-disaster-recovery-plans-are-not-work-fiction/

    Thanked by 1angstrom
  • risharderisharde Patron Provider, Veteran

    @MeAtExampleDotCom I agree with most of what you said there, the plans I had put out were indeed a bit more expensive but well within the LET market. I saw someone write about scalability etc... this is probably true with other raids but again being here and seeing how many people's raids have failed, I can't say statically I hear it's usually everything but raid1. Either way, there is some risk with raid1 but I still with it because I consider my data priceless. And yes I do backups still lol.

  • @john_sd3 said:
    i just wanted to bring to attention

    The quote button. You just quoted something that would have been clear it was a quote if quote formatting was used.

    Thanked by 1angstrom
  • @risharde said:
    this is probably true with other raids but again being here and seeing how many people's raids have failed, I can't say statically I hear it's usually everything but raid1.

    That's less to do with raid1 being more reliable, and more to do with raid1 being less scalable and performant, so not many people/providers will use it. Less arrays in the wild = less arrays to fail.

  • emghemgh Member

    @fluffernutter said:

    @risharde said:
    this is probably true with other raids but again being here and seeing how many people's raids have failed, I can't say statically I hear it's usually everything but raid1.

    That's less to do with raid1 being more reliable, and more to do with raid1 being less scalable and performant, so not many people/providers will use it. Less arrays in the wild = less arrays to fail.

    And maybe some providers lack one of:

    1. Experience to recover from failures that could have been recovered from
    2. Time to justify recovering from failures rhat could have been recovered from
  • crunchbitscrunchbits Member, Patron Provider, Top Host

    @risharde said:
    I've been saying it for years and even the majority of LET users have laughed at me. RAID1 with hotswap and or backup is more reliable but doesn't seem to sell - I offered that and never made my deals get off the ground.

    I thought it was a common practice for providers? Unless you mean RAID1 > RAID5/6/10?

    RAID still isn't a replacement for backups and you can have an entire array corrupt. We've been testing out other ways to protect against that, but sometimes it is inevitable.

    @emgh said:
    @dustinc I may have criticized RackNerd in the past, and I probably will continue to do so for as long as I'm here, healthy and not banned, however, really good email that clearly explained what have happened and RAID failures clearly isn't anything specific to RackNerd.

    Especially for the price, expecting anything more than what was communicated, and even expecting what in-fact was communicated, is ridiculous.

    Edit: What I'm trying to say, without sounding too skeptical, is that even for someone not really a big RackNerd fan, this is defintely them going above and beyond for LET standards.

    I don't think there is much Racknerd could have done better here. Hardware fails, @dustinc sent an excellent email. They're top dog (or one of) around here for VPS stuff so I know it's not popular to give praise, but I definitely agree with your sentiment and they just made my job harder--but it's a net benefit for everyone here.

    @fluffernutter said:

    That's less to do with raid1 being more reliable, and more to do with raid1 being less scalable and performant, so not many people/providers will use it. Less arrays in the wild = less arrays to fail.

    I think you nailed it. Providers are more likely (at scale) to run RAID5, 6, and 10 depending on the setup. RAID1 for us is most commonly used for things like hypervisor OS or just some small internal stuff where a failure would be 'annoying' but not catastrophic. The second half of the puzzle is very proactive monitoring. False positives and too much useless info were something we had to learn to overcome way early on, otherwise everyone gets inadvertently trained to ignore the warning signs.

    Thanked by 1emgh
  • risharderisharde Patron Provider, Veteran
    edited July 2023

    Redacted... I will stick with RAID1 since I haven't lost data in 20 years doing it the way I do it.

    But I must admit that @crunchbits does make a good point about proactive monitoring - in which case only then should you really go for RAID >1. But me, I'll stick with RAID1 cause I'm human.

  • @john_sd3 said:

    @cybertech said:
    why rely on the same provider for backup? even OVH couldn't save themselves.

    so you are saying we should have a mirror on a second provider. sounds cool but how would you manage the mirror, would it be as an offsite backup or as a live hot replacement or something in between.

    For a couple of my bits that run in VMs (a Zimbra instance and a couple of old web servers that host very little these days) I have copies of the VMs running elsewhere. Lower-spec versions, but they can do their jobs if I need to fail-over to them. They exist mainly to test my backups: they restore themselves from the latest backup daily and I login to check all is well (if they are down or the latest data is not present then something needs looking into). They are not publicly visible (VPN only) but if the main services died I could bring these copies online with a couple of firewall and DNS changes (and maybe a manual restore from specific backup if what-ever took down the main VMs also corrupt the latest backup).

    @crunchbits said: I thought it was a common practice for providers? Unless you mean RAID1 > RAID5/6/10?

    Common practise for good providers. I suspect a lot of cheap providers have no RAID at all. I know some don't. I knew at least one (now long defunct) that used JBOD to extend to a second drive in their hosts (not even RAID 0 for the potential performance benefit) so the opposite of redundancy.

    Many seedbox providers use RAID0, but that makes sense for their target audience as the data stored is usually easy to re-obtain if anything dramatic happens (so they tend to be open about it).

Sign In or Register to comment.