New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
I guess it's the same batch of hardware comes to the end?
I was told on another forum that Amsterdam was 'new design' - Ryzen.
This was unrelated to the failures we've had in the recent past that had a common issue that we're eliminating.
This was a more "simple" but fatal error where we had a failed HDD which was scheduled to be replaced yesterday and completed. Yesterday evening our monitoring system alerted us about 60-70% iowait on the node and we saw several VMs being in kernel panic. The RAID controller logs started to fill up with unrecoverable medium errors coming from one drive that was in the same RAID5 group as the original failed drive that was replaced, causing a cold reboot to the node and then that second drive also failed sometime after that. Now we had two failed drives which caused the entire array to go offline.
We tried to recover it to at least a read-only state but there were simply too many parities gone to make any kind of successful data recovery, and we didn't want to have another situation where we spend days worth of time to recover parts of the data. After a few hours we instead started to redeploy the affected VMs to another node that was recently put into production. It has two independent and smaller RAID60 arrays which makes it more reliable without sacrificing too much on the rebuild time (which is also an important factor). It does have a higher cost in overhead but we started to deploy this design everywhere a while back together with our other improvements considering that failures like these cost us far more time, money and reputation loss than the amount of overhead we might save from a few drives per node.
RE: our incident response -
We’ve made several efforts to have better communication in times of crisis, as can be seen from the response to this incident for customers affected (including adding an announcement feature in the panel so people affected can see if there is an ongoing problem), we’re also going to add a feature for emails where this announcement gets emailed in case the issue warrants it, along with the service name, and any updates made to that announcement.
Are you sure about this? I just double-checked our outgoing email log and I see that it was sent to you, with no failed delivery message back to us.
The message that the OP posted is the same email.
https://www.reddit.com/r/msp/comments/thdtzz/vultr_node_failure_with_customer_data_loss/
https://www.reddit.com/r/webhosting/comments/g4owpt/vultrcom_lost_my_server_they_lost_all_my_files/
https://www.reddit.com/r/sysadmin/comments/thdpfw/vultr_node_failure_with_customer_data_loss/
These are just 3 different cases, there's tons more.
Not discounting our failure here, but this does happen, even with the largest providers. I hope this will not turn into another thread with a bunch of people trolling and adding extreme misinformation (such as we use RAID0 - that was a funny one in the last thread, because you can always recover large parts of the data from RAID0, right?), so I generally try to stay away for my own mental health and to be able to focus on the work that needs to be done, after I've added all the important information here.
If any affected customer has genuine questions or concerns, they are welcome to reach out to me - [email protected].
Just checked my spam folder, looks like it ended up there. Really odd, I guess Gmail flagged you as spam. Anyways, backups are restored, thanks for the extra service time!
Nice. And glad to see thay your life has been saved - bcs backups of backups of your backups are back there!
U also should consider and i suggest do backups of backups of backups of your backups
Well... if one has backups, what is the point of storage with @hosthatch ?
Irresistible force paradox: What happens when an irresistible force meets an immovable object?
HostHatch paradox: What happens when an extremely important backup meets a storage prone to involucration?
100TB Data Lost thank you.
The fact that you only get 3 months compensation makes me wonder. With such problems, one should extend the customers at least 10 years the service free of charge.
lol!
Yeah and they should also pay you your salary for the rest of your life.
Would you like them to pay off your bills as well?
Sir, fact that they lost data is surely always bad, but ihmo for emergency backups of important data is responsible person u, not HH (at unmanaged and lowend services, without paid backups). now you can't blame them for it - if u do not have backups of important data, this is your failure. Also especially when large storage node is built just on RAID 5/6(0) is possibility and probability of this pretty higher (that when hdds will have already some age, then may come point of break - when they will not able more survive parity array rebuild and begin die one by one like domino..) (still i think reasons why Fran or Liteserver rather decided just for R10 for their large capacity storages = rather go a bit more expensive but also more safe, is pretty justified) I am very careful and unbelievable about R5/6 storages incl. their striping derivatives. Frankly, IMO to place 10, 20, 50 or 100 TB of important data on RAID 60 as the only original and not having a backup is pretty stupid and reckless
@hosthatch can u confirm that ever some customer does have with u 100TB on VPSes?.. This also sound like pretty invent and exaggeration (at this amount of space is sure more cheap pay for dedicated large storage server (..aka hetzner) than $300+/monthly for it on PVSes)
We do have customers with 100+ TB, but they deploy it across different locations/nodes.
We do not have any customer with a single 100TB VM, as we would likely recommend them to split it into multiple servers at that point.
Like I said earlier - I generally try to stay away from these threads after adding all the relevant information because of all the trolling that goes on, and I am certain that the "100TB data lost" was a poor attempt at just that.
I restored from the backup of the backup of the backup of the backup actually
going back to ultravps.....
on that - can you confirm that all storage sites are using the new storage backend design or if not - which sites/plans/packages/nodes are still using older design?
Back TF up
Definitely not. If they take backups for you, they'll tell you because that's a premium feature. Nobody should expect this without it explicitly being covered and a price premium or factored into the price.
That avatar is just so wrong with that post.
Elementary school: dog ate my homework.
Grad school: HostHatch lost my dissertation.
Don't all schools come with Google or Microsoft education cloud accounts these days?
Who the f.. Is hosthatch 😂😂
At that point, doesn't it make much more sense to store your data on a managed service such as S3?
So many complaints about this provider recently. Looks like their hosts are dying without hatching
And provider should also offer redemption for entrance in heaven, due to the all twists and turns in life caused by such service.
I do that as well, but only as cold storage. I like running things myself, and I enjoy not paying for egress. I pull these files often and my data is georeplicated across LAX, Amsterdam, Stockholm with HH and Canada with Servarica. Works out a lot cheaper than S3 when you factor in the 15TB of egress I use every month.
Fair enough, have you taken a look at Backblaze or Cloudflare's object storages if you're primarily concerned about egress pricing?
Don't get me wrong, I like running stuff myself too, but at some point it just makes more sense to dump it onto a managed provider instead of doing things yourself, especially with incidents like these.
Yep! I take great advantage of the 10-20G burst I can do with HH, Backblaze doesn't have that and R2 is very expensive. With all of my HH VPS + Servarica I am paying $595/year or $49.25 per month on storage, and I am storing 20TB and using 15Tb egress. This is the least expensive way to
A : Store that much data
B : Be able to access it at over 10 gigabit
C : Have it be georedundant
D : Have fun with it
so it's what I currently do. I might end up dropping the Servarica VPS eventually, been very happy with HH. I also use them for compute in Zurich and the VM has been fast and stable, with a few odd network issues but I blame that on M247.
Is this egress only for georeplication, or do you actually egress that much data for general use? Do you mind sharing a bit more about what you run on these servers (e.g. NFS/Minio/Borgbackup etc.)
Interesting. If I run the numbers, assuming 20TB of replicated data, thus reducing the actual amount of data to 5TB, you still get a pricing of $0.00985/GB, which happens to be the same as S3's Onezone-IA. Not bad, I'd say.