New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Comments
Luckily, the fire did not spread throughout the high rise - though, as a result of power being shut down, it did impact uptime, and sadly some hardware. We're doing our absolute best.
That's always a good thing to do, so we commend you for doing such. Thank You for sticking with us through the tough times.
Hi @john_sd3 — Currently, the India NetBanking and India UPI payment options are temporarily unavailable due to the fact that our third-party payment processor, Payssion, has temporarily deactivated these payment options. As soon as these are made available to us again, we'll reactivate it on our end.
Thanks, appreciate the reply. I do have a backup, but don't want to deploy until I know my particular VPS won't be powered up again.
@dustinc any update on node LAXSSD4024nerd6DC02 ?
We're still working on this node -- our priority is restoring service with our customer's data - it is a slow process. Further updates to follow via status or e-mail.
@dustinc losing my patience, that's nearly 48hrs of offline. Will customers who are impacted by extended downtime receiving compensations for you guys disappointing us with "100% Uptime Guarantee", which is bolded on the description of DC-02?
losing millions?
data is priceless bro
backups have been a thing for decades
what if the instance happens to be literally my remote backup and now I need it because my local NAS is out of sync, and I found hosting on RN is much more affordable than other solutions?
I'm sorry but this is the dilemma I'm facing and your sarcastic comments under the feedback thread are not helpful for anyone right here, maybe just leaving them for inexperienced newcomers not dong 321 properly, please.
you got the point
fine. I'll take that
I don't think communication has been that great, despite the detail on the status page. It's now been 52 hours since the server went down, and there's been no direct communication from RackNerd about the issue. I would have at least expected an e-mail after 24 hours of downtime explaining the situation.
Since the control panel is down, we can't even see what host our server was on - I only happen to know because you once mentioned it in a support ticket (or at least 2.5 years ago I was on LAXSSD5006nerd3DC02).
The last long list had 35 hosts still to be recovered, since then 10 have been named as fixed and then "an additional 2 hypervisors today thus far" without indicating which, and none of the updates are timestamped (since writing this, reloading the page it now lists the 2 and an additional 4). In any case, that still sounds like over 15 that still might be fixable or might not, but with a fixing velocity slowed down to somewhere around 10 per day.
There's no information on whether these are being looked at in parallel (I understand that a RAID rebuild is slow, but once it's started, it's mostly automatic) and what the likelihood of a given node will be recovered or not. I know disks can fail during resilvering, but having some information on likelihood of success (i.e. n out of m drives currently working), anticipated time frame (how long it's been rebuilding the array and %age complete) or alternative options such as being given a new instance on fresh host would help.
I do have backups and I'm not losing millions, but if it's likely that I'm going to have to reinstall from scratch, I'd rather get that process moving on a different sooner rather than waiting another few days. Similarly, if we have to wait a couple more days just to find out we have to reinstall anyway, that'd be a far worse outcome than spending a few hours today rebuilding and being told in 3 days time that actually we could have had our original instance restarted. OTOH, if it's only a few hours left, it makes sense to wait and see.
At the moment, we just have no idea. I think it was around 24 hours you posted "We will provide customers with available options as soon as we have more information, so that customers who have yet to activate their disaster recovery plans, can do so with our proposed solutions."
On the positive side, at least so far you've only reported successful rebuilds and no total RAID failures.
Unfortunately, I lose my patience to :-( We have list of nodes, but don't know where is our vps located. It's 3 days offline. I'm not sure that my sites will not be de-indexed from search engines with full traffic lose
If I had known that it would take three days, I would have tried to migrate from backups.
We need to know perspectives to not only sit and wait...
The same thoughts. I was only stopped by the fact that I don’t have the latest backups, need to pay or new hosting and re-setup everything, but this is better than 3 dayr (or week) of downtime. :-(
Hi @ralf and @Milon -- I completely understand your concerns, and I agree that communication could have been improved throughout this process. As you’ve probably noticed, we are generally very proactive and responsive in addressing any issues. In this unprecedented situation, all hands have been on deck, and it’s been challenging. To provide a bit more context, as you know, the downtime stemmed from an unexpected fire on the 61st floor of the high-rise, which, thankfully, did not physically reach the servers or spread within the facility. However, the fire did result in an immediate, unforeseen power shutdown as mandated by the Los Angeles Fire Department affecting all systems, an issue that was beyond both our control and the facility’s.
Our highest priority has been to salvage and retain our customers' data and carefully checking it on a node by node basis. After a widespread, unexpected power outage like this, isolated issues with individual servers are not uncommon. In this case, while hundreds of physical nodes came back online without requiring manual intervention, about 30 nodes required individual attention to restore. A good number of these have since been recovered and brought back online -- some with minor repairs like PSU, RAID controller, or motherboard replacements, while others required more complex processes like RAID rebuilds.
Since the incident, our team has been working diligently through each affected node, with most of our staff members working 12-14+ hour shifts, prioritizing getting each node back online as quickly as possible. Some nodes encountered data corruption; when recovery was deemed impossible, we immediately reached out to affected customers and provisioned replacement services. Other nodes required RAID rebuilds (with no data loss), which, as you noted, is a lengthy process. While some just required minor repairs such as a motherboard or RAID controller replacement, etc. Currently at the time of writing this, we’re working on the final four nodes remaining on our list that require individual attention. These last 4 are proving to be the most difficult and challenging, but we're not giving up until all possible attempts have been exhausted.
As of yesterday evening, I have also directed our team responsible for status updates to include specific node names, and we will continue to do so to ensure transparency.
If anyone is still offline and wishes to proceed with their disaster recovery plans by setting up a fresh VM to re-establish their environments, instead of waiting for our recovery efforts, please reach out to us via ticket, and we will expedite the process.
We sincerely appreciate your business and understanding as we work through this process.
I received the dreaded support ticket this morning - after all this time it turns out they were unable to recover. RackNerd have replaced my VPS with a fresh one and given a month credit as compensation. Lost the IPv6 and rDNS settings, but I've submitted a ticket and it should be easy enough to correct.
I'm not really angry about this - with such a cheap service expectations aren't super high. But RackNerd definitely has some work to do - this was essentially just a simple power outage and that should not be the cause of such a major incident. If they use the same hardware globally then any node is susceptible to the same issue.
Hi @dzzzzz -- Thank you for your patience throughout this process. Please do submit a ticket if you haven't already, and we'll prioritize taking care of your IPv6 and rDNS settings.
While power outages can affect any provider (or any environment, for that matter) regardless of size or tier, we understand the impact this has had on your service. In LA DC-02, while most of our footprint was unaffected, some required additional intervention (majority of our infrastructure came back online without being affected). Our hypervisors utilize a 8x SSD RAID-10 configuration for redundancy, which typically provides excellent protection against drive failures. However, RAID-10 can only withstand up to two drive failures within the same span, and sudden power loss events can, in a worse case scenario, trigger multiple simultaneous drive failures that exceed this threshold. We also know that other customers/tenants of Multacom, with different environments/setups, were also impacted, so just for clarification, it’s not confined to any particular type of setup or specification configuration.
In your specific case, despite our recovery efforts, we weren't able to recover the node, so we moved forward to reprovision your instance accordingly. While we acknowledge this is a budget-friendly service as you pointed out, we still applied great attention to detail here, and we truly tried our very best here.
Thank you for all your upgrades here @dustinc and you service anyway. I share the same thoughts and vision like @dzzzzz but 7+h later and zero upgrades on status page :-) Good that you share information here that it's still possible to recover nodes if I still don't receive any ticket about vps change.
I still hope that at least my node will be possible to boot online and it won't have to request to reinstall everything in a hurry and worry about data lose.
About RAID 10... maybe it's not a good choose? if the raid 10 is intended to save clients from data loss due to redundancy, but in fact with a high degree of probability leads to data lose during a sudden power outage(?).
You should not trust a company that does not make external backups.
Neither one that sells lifetime deals
RAID-10 and power resilience are different topics. RAID-10 gives some extra time for a provider to replace disk without data loss. As for power resilience, it depends on software configuration and if we are talking about RAID, hardware controller with battery is a solution for this one.
But you should not expect high-availability from LET. Always do backups, and if you deem your service critical, implement high-available cluster yourself.
Any progress?
.> @tentor said:
I think you're thinking with your "host rep" hat on, and not actually seeing what he's asking. His question (I believe) is whether relying on RAID 10 is sufficient for purpose, if a power outage has led to such a catastrophic failure on so many nodes.
I notice that many providers now have shifted to ceph which distributes data across nodes as well as disks, and also has the advantage that storage is no longer tied to any specific host and so VMs can be migrated very easily to other hosts, for instance if a host motherboard is fried in a power outage. It'd be interesting to hear their experiences if any of them have experienced a similar wide-scale outage with many devices failing at the same time, and how they recovered from it.
Yeah, 72 hours into the downtime I got "the email" and a blank VM.
you are lucky... I kept hope that everything would be restored and so I waited 5 days and at the end what was predicted in this thread: complete loss of data and reinstallation.
We moved our whole CEPH infrastructure back in 2020 from firstcolo to maincubes, everything came perfectly back up again after booting nodes.
Just like a CEPH cluster that i had my hands on, a fatal power loss killed all nodes and after restoring power nothing had to be done apart from running fsck.ext4 on some VMs.
This will always be the case because of the nature how ceph works (pg quoroms, sync ack. etc.)
How am I lucky? That's exactly the same!
Except, to be fair, it wasn't 5 days as it was still less than 96 hours since the outage when you replied, so you must have got your replacement VM within 4 days.
It was annoying having to spend my Saturday re-installing though when it could have been done a couple of days earlier.
I too got the blank details email, I've now been sent login credentials for a new node.
Any early black friday deals for those of us who have lost all our data??
If i'm going to have to go through the effort of starting from scratch and spend a day re-installing, i'd be keen for a deal to upgrade to a better spec'ed vps. (Maybe Ryzen with a decent amount of ram and storage?)