HostHatch Amsterdam Node Failure | All Data Lost | Still Reliable?

elliotc · June 2022

@default said:
Chicago failed in the past.
Los Angeles failed in the past.
Now Amsterdam...

The end is nigh.

I guess it's the same batch of hardware comes to the end?

JabJab · June 2022

@elliotc said: I guess it's the same batch of hardware comes to the end?

I was told on another forum that Amsterdam was 'new design' - Ryzen.

Mine was using the new hardware build with AMD processor.

hosthatch · June 2022

This was unrelated to the failures we've had in the recent past that had a common issue that we're eliminating.

This was a more "simple" but fatal error where we had a failed HDD which was scheduled to be replaced yesterday and completed. Yesterday evening our monitoring system alerted us about 60-70% iowait on the node and we saw several VMs being in kernel panic. The RAID controller logs started to fill up with unrecoverable medium errors coming from one drive that was in the same RAID5 group as the original failed drive that was replaced, causing a cold reboot to the node and then that second drive also failed sometime after that. Now we had two failed drives which caused the entire array to go offline.

We tried to recover it to at least a read-only state but there were simply too many parities gone to make any kind of successful data recovery, and we didn't want to have another situation where we spend days worth of time to recover parts of the data. After a few hours we instead started to redeploy the affected VMs to another node that was recently put into production. It has two independent and smaller RAID60 arrays which makes it more reliable without sacrificing too much on the rebuild time (which is also an important factor). It does have a higher cost in overhead but we started to deploy this design everywhere a while back together with our other improvements considering that failures like these cost us far more time, money and reputation loss than the amount of overhead we might save from a few drives per node.

RE: our incident response -

We’ve made several efforts to have better communication in times of crisis, as can be seen from the response to this incident for customers affected (including adding an announcement feature in the panel so people affected can see if there is an ongoing problem), we’re also going to add a feature for emails where this announcement gets emailed in case the issue warrants it, along with the service name, and any updates made to that announcement.

hosthatch · June 2022

@LiliLabs said:
Panel says they sent an email over, but I didn't get anything.

Are you sure about this? I just double-checked our outgoing email log and I see that it was sent to you, with no failed delivery message back to us.

The message that the OP posted is the same email.

@FoxelVox said:
As always lowenders: this is an LOW-END service with no SLA whatsoever, people's expectations for an low-end service is way too high these days.

Shit like this happens. Pay monkeys, get peanuts. Also:

https://www.reddit.com/r/msp/comments/thdtzz/vultr_node_failure_with_customer_data_loss/
https://www.reddit.com/r/webhosting/comments/g4owpt/vultrcom_lost_my_server_they_lost_all_my_files/
https://www.reddit.com/r/sysadmin/comments/thdpfw/vultr_node_failure_with_customer_data_loss/

These are just 3 different cases, there's tons more.

Not discounting our failure here, but this does happen, even with the largest providers. I hope this will not turn into another thread with a bunch of people trolling and adding extreme misinformation (such as we use RAID0 - that was a funny one in the last thread, because you can always recover large parts of the data from RAID0, right?), so I generally try to stay away for my own mental health and to be able to focus on the work that needs to be done, after I've added all the important information here.

If any affected customer has genuine questions or concerns, they are welcome to reach out to me - [email protected].

fluffernutter · June 2022

@hosthatch said:

@LiliLabs said:
Panel says they sent an email over, but I didn't get anything.

Are you sure about this? I just double-checked our outgoing email log and I see that it was sent to you, with no failed delivery message back to us.

Just checked my spam folder, looks like it ended up there. Really odd, I guess Gmail flagged you as spam. Anyways, backups are restored, thanks for the extra service time!

miu · June 2022

@LiliLabs said:

@hosthatch said:

@LiliLabs said:
Panel says they sent an email over, but I didn't get anything.

Are you sure about this? I just double-checked our outgoing email log and I see that it was sent to you, with no failed delivery message back to us.

Just checked my spam folder, looks like it ended up there. Really odd, I guess Gmail flagged you as spam. Anyways, backups are restored, thanks for the extra service time!

Nice. And glad to see thay your life has been saved - bcs backups of backups of your backups are back there!

miu · June 2022

U also should consider and i suggest do backups of backups of backups of your backups

default · June 2022

@miu said:
U also should consider and i suggest do backups of backups of backups of your backups

Well... if one has backups, what is the point of storage with @hosthatch ?

Irresistible force paradox: What happens when an irresistible force meets an immovable object?

HostHatch paradox: What happens when an extremely important backup meets a storage prone to involucration?

TeoM · June 2022

100TB Data Lost thank you.

TeoM · June 2022

The fact that you only get 3 months compensation makes me wonder. With such problems, one should extend the customers at least 10 years the service free of charge.

zed · June 2022

@TeoM said:
The fact that you only get 3 months compensation makes me wonder. With such problems, one should extend the customers at least 10 years the service free of charge.

lol!

zafouhar · June 2022

@TeoM said:
The fact that you only get 3 months compensation makes me wonder. With such problems, one should extend the customers at least 10 years the service free of charge.

Yeah and they should also pay you your salary for the rest of your life.

nvme · June 2022

@TeoM said:
The fact that you only get 3 months compensation makes me wonder. With such problems, one should extend the customers at least 10 years the service free of charge.

Would you like them to pay off your bills as well?

miu · June 2022

@TeoM said:
100TB Data Lost thank you.

Sir, fact that they lost data is surely always bad, but ihmo for emergency backups of important data is responsible person u, not HH (at unmanaged and lowend services, without paid backups). now you can't blame them for it - if u do not have backups of important data, this is your failure. Also especially when large storage node is built just on RAID 5/6(0) is possibility and probability of this pretty higher (that when hdds will have already some age, then may come point of break - when they will not able more survive parity array rebuild and begin die one by one like domino..) (still i think reasons why Fran or Liteserver rather decided just for R10 for their large capacity storages = rather go a bit more expensive but also more safe, is pretty justified) I am very careful and unbelievable about R5/6 storages incl. their striping derivatives. Frankly, IMO to place 10, 20, 50 or 100 TB of important data on RAID 60 as the only original and not having a backup is pretty stupid and reckless

@hosthatch can u confirm that ever some customer does have with u 100TB on VPSes?.. This also sound like pretty invent and exaggeration (at this amount of space is sure more cheap pay for dedicated large storage server (..aka hetzner) than $300+/monthly for it on PVSes)

hosthatch · June 2022

@miu said: @hosthatch can u confirm that ever some customer does have with u 100TB on VPSes?.. This also sound like pretty invent and exaggeration (at this amount of space is sure more cheap pay for dedicated large storage server (..aka hetzner) than $300+/monthly for it on PVSes)

We do have customers with 100+ TB, but they deploy it across different locations/nodes.

We do not have any customer with a single 100TB VM, as we would likely recommend them to split it into multiple servers at that point.

Like I said earlier - I generally try to stay away from these threads after adding all the relevant information because of all the trolling that goes on, and I am certain that the "100TB data lost" was a poor attempt at just that.

fluffernutter · June 2022

@miu said:
U also should consider and i suggest do backups of backups of backups of your backups

I restored from the backup of the backup of the backup of the backup actually

gfa · June 2022

going back to ultravps.....

bdl · June 2022

@hosthatch said:
.... smaller RAID60 arrays which makes it more reliable without sacrificing too much on the rebuild time (which is also an important factor). It does have a higher cost in overhead but we started to deploy this design everywhere a while back together with our other improvements considering that failures like these cost us far more time, money and reputation loss than the amount of overhead we might save from a few drives per node.

on that - can you confirm that all storage sites are using the new storage backend design or if not - which sites/plans/packages/nodes are still using older design?

@LiliLabs said:
I restored from the backup of the backup of the backup of the backup actually

Back TF up

TimboJones · June 2022

@hostdare said:
unless providers specifically mentions , they take backup of data .

Definitely not. If they take backups for you, they'll tell you because that's a premium feature. Nobody should expect this without it explicitly being covered and a price premium or factored into the price.

TimboJones · June 2022

@tarek.box said:
same problem 10TB of Data Lost

That avatar is just so wrong with that post.

yoursunny · June 2022

Elementary school: dog ate my homework.
Grad school: HostHatch lost my dissertation.

TimboJones · June 2022

@yoursunny said:
Elementary school: dog ate my homework.
Grad school: HostHatch lost my dissertation.

Don't all schools come with Google or Microsoft education cloud accounts these days?

TeoM · June 2022

@yoursunny said:
Elementary school: dog ate my homework.
Grad school: HostHatch lost my dissertation.

Who the f.. Is hosthatch 😂😂

bulbasaur · June 2022

@LiliLabs said:

@miu said:
U also should consider and i suggest do backups of backups of backups of your backups

I restored from the backup of the backup of the backup of the backup actually

At that point, doesn't it make much more sense to store your data on a managed service such as S3?

Void · June 2022

So many complaints about this provider recently. Looks like their hosts are dying without hatching

default · June 2022

@TeoM said:
The fact that you only get 3 months compensation makes me wonder. With such problems, one should extend the customers at least 10 years the service free of charge.

And provider should also offer redemption for entrance in heaven, due to the all twists and turns in life caused by such service.

fluffernutter · June 2022

@stevewatson301 said:
At that point, doesn't it make much more sense to store your data on a managed service such as S3?

I do that as well, but only as cold storage. I like running things myself, and I enjoy not paying for egress. I pull these files often and my data is georeplicated across LAX, Amsterdam, Stockholm with HH and Canada with Servarica. Works out a lot cheaper than S3 when you factor in the 15TB of egress I use every month.

bulbasaur · June 2022

@LiliLabs said:

@stevewatson301 said:
At that point, doesn't it make much more sense to store your data on a managed service such as S3?

I like running things myself, and I enjoy not paying for egress. (...) Works out a lot cheaper than S3 when you factor in the 15TB of egress I use every month

Fair enough, have you taken a look at Backblaze or Cloudflare's object storages if you're primarily concerned about egress pricing?

Don't get me wrong, I like running stuff myself too, but at some point it just makes more sense to dump it onto a managed provider instead of doing things yourself, especially with incidents like these.

fluffernutter · June 2022

@stevewatson301 said:
Fair enough, have you taken a look at Backblaze or Cloudflare's object storages if you're primarily concerned about egress pricing?

Don't get me wrong, I like running stuff myself too, but at some point it just makes more sense to dump it onto a managed provider instead of doing things yourself.

Yep! I take great advantage of the 10-20G burst I can do with HH, Backblaze doesn't have that and R2 is very expensive. With all of my HH VPS + Servarica I am paying $595/year or $49.25 per month on storage, and I am storing 20TB and using 15Tb egress. This is the least expensive way to

A : Store that much data
B : Be able to access it at over 10 gigabit
C : Have it be georedundant
D : Have fun with it

so it's what I currently do. I might end up dropping the Servarica VPS eventually, been very happy with HH. I also use them for compute in Zurich and the VM has been fast and stable, with a few odd network issues but I blame that on M247.

bulbasaur · June 2022

@LiliLabs said: Yep! I take great advantage of the 10-20G burst I can do with HH, Backblaze doesn't have that and R2 is very expensive. With all of my HH VPS + Servarica I am paying $595/year or $49.25 per month on storage, and I am storing 20TB and using 15Tb egress.

Is this egress only for georeplication, or do you actually egress that much data for general use? Do you mind sharing a bit more about what you run on these servers (e.g. NFS/Minio/Borgbackup etc.)

@LiliLabs said: This is the least expensive way to

Interesting. If I run the numbers, assuming 20TB of replicated data, thus reducing the actual amount of data to 5TB, you still get a pricing of $0.00985/GB, which happens to be the same as S3's Onezone-IA. Not bad, I'd say.

Howdy, Stranger!

Categories

In this Discussion

HostHatch Amsterdam Node Failure | All Data Lost | Still Reliable?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

HostHatch Amsterdam Node Failure | All Data Lost | Still Reliable?

Comments