HostHatch Los Angeles storage data corruption (Was: HostHatch Los Angeles storage down)

skorous · May 2022

@risharde said:
TRIMMED

@DP I didn't say that - so either my account here is hacked or you messed up the quoting. Please confirm or point me to the post where I said this.

Yeah, he messed up quoting @sidewinder.

skorous · May 2022

@ralf said:

From the sounds of it though, people's disk images are corrupt and it's being blamed on a known RAID controller issue. With no real confidence that the disk corruption won't happen again, it seems less about people asking for support and more about being hosted on a server that actually works.

To be fair, was it really a known issue until Chicago happened? They had one occurrence on one node of many in LA without a definitive cause. Two instances starts to indicate a pattern and as they said they're going to accelerate plans to migrate everybody to new hardware as a result.

As with everybody before me I'm not defending them going dark. That part is fairly inexcusable.

sidewinder · May 2022

i've booted into recovery mode on CentOs - any hail mary's i can throw at this ghetto VPS to fix the corruption?

what parameters should i run with fsck?

thanks for all the emotional support.

They 100% should of nuked the entire node and made everyone start over. I don't think anyone's container came back?

sidewinder · May 2022

is their cloud on a different infrastructure?

digitalwicked · May 2022

I'm interested as to if everyone is actually on the same node, what CPU is showing up running 'cat /proc/cpuinfo'?

sidewinder · May 2022

https://ibb.co/b62Cf12

@digitalwicked said:
I'm interested as to if everyone is actually on the same node, what CPU is showing up running 'cat /proc/cpuinfo'?

tetech · May 2022

@skorous said: To be fair, was it really a known issue until Chicago happened? They had one occurrence on one node of many in LA without a definitive cause. Two instances starts to indicate a pattern and as they said they're going to accelerate plans to migrate everybody to new hardware as a result.

Seems to be the same as last December in LA. Below from December 2021.

Hello,

As you may be aware, there have been multiple outages that have affected your active storage VM in Los Angeles, hosted on the node "STOR4.LAX"

From our troubleshooting so far, the RAID card in the server appears to be failing and kicking out healthy drives. This is an extremely rare situation, but it is happening at the moment.

Unfortunately, we cannot guarantee the integrity of the data on the array and are working on moving VMs away from this node. We have the following two options available for you:

1) We create a new VM for you. You move over the data yourself or restore from your backups. We remove the old VM and move over your IP address (if that is needed by you).
2) We migrate your VM to another (healthy) node. However, depending on the size of your storage VM, it may take a long time to migrate during which your VM will remain offline.

Please reply to this email and let us know what you prefer out of these two options.

We would also like to ask you to take a fresh backup of your important data on the VM as soon as possible, in case we have to deal with the worst-case scenario of complete data loss.

Apologies for the inconvenience and we are doing the best on our end to get this resolved ASAP.

Kindest Regards,
Your HostHatch team

Daniel15 · May 2022

@tetech Wow, I was unaware of that. Sounds like STOR2.LAX, STOR4.LAX, and whichever node in Chicago it was, all failed with the same problem...

sidewinder · May 2022

this place is a sinking ship. can someone recommend something in the same price range with a little better hardware?

Daniel15 · May 2022

@sidewinder said: something in the same price range

This is very hard to find. Their prices are extremely competitive. For storage you could try Servarica or VirMach instead, but the prices aren't quite as good.

@sidewinder said: with a little better hardware?

AFAIK their new hardware is a lot better than the old hardware. Not sure about the RAID hardware though. It'd be useful for them to tell us the model numbers of their old RAID cards vs the RAID cards used in the newer servers.

dahartigan · May 2022

@Daniel15 said: Servarica

Very solid alternative.

xetsys · May 2022

@tetech said:

@skorous said: To be fair, was it really a known issue until Chicago happened? They had one occurrence on one node of many in LA without a definitive cause. Two instances starts to indicate a pattern and as they said they're going to accelerate plans to migrate everybody to new hardware as a result.

Seems to be the same as last December in LA. Below from December 2021.

Hello,

As you may be aware, there have been multiple outages that have affected your active storage VM in Los Angeles, hosted on the node "STOR4.LAX"

From our troubleshooting so far, the RAID card in the server appears to be failing and kicking out healthy drives. This is an extremely rare situation, but it is happening at the moment.

Unfortunately, we cannot guarantee the integrity of the data on the array and are working on moving VMs away from this node. We have the following two options available for you:

1) We create a new VM for you. You move over the data yourself or restore from your backups. We remove the old VM and move over your IP address (if that is needed by you).
2) We migrate your VM to another (healthy) node. However, depending on the size of your storage VM, it may take a long time to migrate during which your VM will remain offline.

Please reply to this email and let us know what you prefer out of these two options.

We would also like to ask you to take a fresh backup of your important data on the VM as soon as possible, in case we have to deal with the worst-case scenario of complete data loss.

Apologies for the inconvenience and we are doing the best on our end to get this resolved ASAP.

Kindest Regards,
Your HostHatch team

I wonder if there have been any incident that involved disc failure and successful data recovery? Or all those have been "raid card failure"? This more and more feels like raid0 configuration. A VM wont be able to tell if the host storage is raid6 or raid0. That info is probably not available for virtual environment.

jugganuts · May 2022

sounds like my LAX instance all over again...

default · May 2022

@Daniel15 said:
Wow, I was unaware of that. Sounds like STOR2.LAX, STOR4.LAX, and whichever node in Chicago it was, all failed with the same problem...

So... is it safe to assume this might be another involucration?

bulbasaur · May 2022

@default said: default

What's this, penis sect?

default · May 2022

@stevewatson301 said:

@default said: default

What's this, penis sect?

darkimmortal · May 2022

@xetsys said:

@tetech said:

@skorous said: To be fair, was it really a known issue until Chicago happened? They had one occurrence on one node of many in LA without a definitive cause. Two instances starts to indicate a pattern and as they said they're going to accelerate plans to migrate everybody to new hardware as a result.

Seems to be the same as last December in LA. Below from December 2021.

Hello,

As you may be aware, there have been multiple outages that have affected your active storage VM in Los Angeles, hosted on the node "STOR4.LAX"

From our troubleshooting so far, the RAID card in the server appears to be failing and kicking out healthy drives. This is an extremely rare situation, but it is happening at the moment.

Unfortunately, we cannot guarantee the integrity of the data on the array and are working on moving VMs away from this node. We have the following two options available for you:

1) We create a new VM for you. You move over the data yourself or restore from your backups. We remove the old VM and move over your IP address (if that is needed by you).
2) We migrate your VM to another (healthy) node. However, depending on the size of your storage VM, it may take a long time to migrate during which your VM will remain offline.

Please reply to this email and let us know what you prefer out of these two options.

We would also like to ask you to take a fresh backup of your important data on the VM as soon as possible, in case we have to deal with the worst-case scenario of complete data loss.

Apologies for the inconvenience and we are doing the best on our end to get this resolved ASAP.

Kindest Regards,
Your HostHatch team

I wonder if there have been any incident that involved disc failure and successful data recovery? Or all those have been "raid card failure"? This more and more feels like raid0 configuration. A VM wont be able to tell if the host storage is raid6 or raid0. That info is probably not available for virtual environment.

The symptoms don’t really match a raid 0 failure

Daniel15 · May 2022

In case it's useful to someone, I took @joelby's idea of using dpkg -V to verify packages using their checksums, and ran it across all packages returned by apt list --installed:

apt list --installed | cut -d/ -f1 | xargs -i sh -c 'echo {} && dpkg -V {}' | tee /tmp/dpkg-verify-output.txt

differences in config files are expected if you modified that file, but differences in binaries likely means they're corrupted.

You can then use apt install --reinstall to reinstall the packages that are corrupted.

msallak1 · May 2022

I wasn't able to fix the booting issue but I managed to uninstal the os via this tool https://sourceforge.net/p/boot-repair-cd/home/Home/

After removing the os It kept my old files under 'deleted_os' folder. After that I mounted debian 10 iso and installed new os, and I chose not to format the drive.

Now I can boot to my server and can access my old files and hopefully transfer them to a new server soon.

TimboJones · May 2022

@darkimmortal said:

@xetsys said:

@tetech said:

@skorous said: To be fair, was it really a known issue until Chicago happened? They had one occurrence on one node of many in LA without a definitive cause. Two instances starts to indicate a pattern and as they said they're going to accelerate plans to migrate everybody to new hardware as a result.

Seems to be the same as last December in LA. Below from December 2021.

Hello,

As you may be aware, there have been multiple outages that have affected your active storage VM in Los Angeles, hosted on the node "STOR4.LAX"

From our troubleshooting so far, the RAID card in the server appears to be failing and kicking out healthy drives. This is an extremely rare situation, but it is happening at the moment.

Unfortunately, we cannot guarantee the integrity of the data on the array and are working on moving VMs away from this node. We have the following two options available for you:

1) We create a new VM for you. You move over the data yourself or restore from your backups. We remove the old VM and move over your IP address (if that is needed by you).
2) We migrate your VM to another (healthy) node. However, depending on the size of your storage VM, it may take a long time to migrate during which your VM will remain offline.

Please reply to this email and let us know what you prefer out of these two options.

We would also like to ask you to take a fresh backup of your important data on the VM as soon as possible, in case we have to deal with the worst-case scenario of complete data loss.

Apologies for the inconvenience and we are doing the best on our end to get this resolved ASAP.

Kindest Regards,
Your HostHatch team

I wonder if there have been any incident that involved disc failure and successful data recovery? Or all those have been "raid card failure"? This more and more feels like raid0 configuration. A VM wont be able to tell if the host storage is raid6 or raid0. That info is probably not available for virtual environment.

The symptoms don’t really match a raid 0 failure

Great argument! Oh wait...

dahartigan · May 2022

@TimboJones said:

@darkimmortal said:

@xetsys said:

@tetech said:

@skorous said: To be fair, was it really a known issue until Chicago happened? They had one occurrence on one node of many in LA without a definitive cause. Two instances starts to indicate a pattern and as they said they're going to accelerate plans to migrate everybody to new hardware as a result.

Seems to be the same as last December in LA. Below from December 2021.

Hello,

As you may be aware, there have been multiple outages that have affected your active storage VM in Los Angeles, hosted on the node "STOR4.LAX"

From our troubleshooting so far, the RAID card in the server appears to be failing and kicking out healthy drives. This is an extremely rare situation, but it is happening at the moment.

Unfortunately, we cannot guarantee the integrity of the data on the array and are working on moving VMs away from this node. We have the following two options available for you:

1) We create a new VM for you. You move over the data yourself or restore from your backups. We remove the old VM and move over your IP address (if that is needed by you).
2) We migrate your VM to another (healthy) node. However, depending on the size of your storage VM, it may take a long time to migrate during which your VM will remain offline.

Please reply to this email and let us know what you prefer out of these two options.

We would also like to ask you to take a fresh backup of your important data on the VM as soon as possible, in case we have to deal with the worst-case scenario of complete data loss.

Apologies for the inconvenience and we are doing the best on our end to get this resolved ASAP.

Kindest Regards,
Your HostHatch team

I wonder if there have been any incident that involved disc failure and successful data recovery? Or all those have been "raid card failure"? This more and more feels like raid0 configuration. A VM wont be able to tell if the host storage is raid6 or raid0. That info is probably not available for virtual environment.

The symptoms don’t really match a raid 0 failure

Great argument! Oh wait...

RAID failure? No
Invoulcration? Yes

alvin · May 2022

Em..
Chicago Node Status Down

cablepick · May 2022

@alvin said:
Em..
Chicago Node Status Down

I think this is all of Chicago as my NVME services went down at the same time.

I’m still moving all my data from Chicago to LA to recover from this recent LA data loss.

I have storage services in Chicago, LA, and London. Both LA services and my 10TB Chicago service have had to be migrated because of this issue. Both the 10TB in LA and Chicago resulted in data loss. Wonder how long it be until London suffers the same fate.

kheng86 · May 2022

@alvin said:
Em..
Chicago Node Status Down

Mine is down too.

leonidas · May 2022

Yes my chicago storage new VM is down too.
This node is their new provisioned vm after the major chicago storage down.
Hosthatch is no longer reliable....

epaslv · May 2022

Chicago node down also

fluffernutter · May 2022

My Chicago node is up, looks like it just rebooted. Uptime is showing as 11 minutes. Also @cablepick I don't think they offer storage in London, just LAX, Chicago, Amsterdam, and Stockholm. Maybe the DC lost power, or we were all on the same node? I'm on an e5 server there, if anyone was on Epyc then it would confirm a larger issue.

zeli · May 2022

@LiliLabs said:
Also @cablepick I don't think they offer storage in London

They do, my storage vps is in London.

fluffernutter · May 2022

@zeli said:

@LiliLabs said:
Also @cablepick I don't think they offer storage in London

They do, my storage vps is in London.

Super interesting, it's not listed on their site at all! I'll have to see if I can still order one.

zeli · May 2022

@LiliLabs said:

@zeli said:

@LiliLabs said:
Also @cablepick I don't think they offer storage in London

They do, my storage vps is in London.

Super interesting, it's not listed on their site at all! I'll have to see if I can still order one.

It was a BF promo deal though so maybe that's why.

Howdy, Stranger!

Categories

In this Discussion

HostHatch Los Angeles storage data corruption (Was: HostHatch Los Angeles storage down)

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

HostHatch Los Angeles storage data corruption (Was: HostHatch Los Angeles storage down)

Comments