[Emergency Migration] Your service on NYHDD10GKVM2

jason5545 · July 2022

Just received this from virmach. I guess it's my turn.

Dear VirMach Customers,

Your storage service was located on a server that unfortunately is having hardwre issues. We apologize for the delayed email in this case but we have been working hard on deploying a suitable replacement server and noticed the situation escalate today. Services were powered off to reduce chances of further hardware issue, and to increase the speed of the migration since this is a storage server.

While preparing for migrations to a new node as a part of our Ryzen upgrade program, we performed routine backups and noticed that during the backup process, two disks on the server's RAID10 controller malfunctioned. Luckily, this was caught in time and we happened to be planning the hardware refresh, so it will now be prioritized and your data is being moved to the brand new server, with much newer processor, hard disks, and RAID controller. Since we were backing this server up in preparation for the final upcoming migration, we've pushed the migration to now and we also luckily have backups of most customers' data over the last week in case of emergency. We will attempt to migrate the bulk of services in a manner to ensure quickest service restoration but please understand that this is a large storage node so the outage may be longer than regular services.

We would rather not risk data loss by requesting the datacenter attempt to replace two disks (to avoid the chance of human error) and very long RAID re-learning process.

-VirMach

risharde · July 2022

Confirming receipt. Hopefully they do what they say and it doesn't take weeks to get back up and running.

jason5545 · July 2022

Update

Hello,

We sent a previous email describing the process we're going through for your storage service. Unfortunately, the server being in a degraded state is slowing down the process and as a result, we are going to begin loading in disaster recovery backups for older services to the closest available location, in Los Angeles.

We will have two queues going simultaneously:

Newest to oldest services (meaning you activated your service more recently) will go from the degraded server to NYCB004S (new NYC storage server.)
Oldest to newest services (meaning you've had the service for a while) will go from backup node to LAXA004S (new LAX storage server.)
If your service ends up in LAX, or if the backup data is too old, you can contact us in a ticket to have us attempt a new sync from the degraded node to NYCB004S instead. Please wait until the information is updated, and then you can create a ticket in the priority department called "LAXA004S to NYCB004S" if you need a location change and "NYCB004S Resync" if you require data to be moved again. For the people in between these two queues, it is possible you will end up with two copies, both on LAXA004S and NYCB004S, and one or the other will be assigned to you.

This is in an effort to speed up migrations since we have port speed limitations and disk speed limitations due to RAID degradation.

Thank you.

Looks like not going too well.

dahartigan · July 2022

Disaster recovery plans will be activated.

nvme · July 2022

People will loose millions.

vyas11 · July 2022

PMS will be in overdrive

VirMach · July 2022

@risharde said:
Confirming receipt. Hopefully they do what they say and it doesn't take weeks to get back up and running.

We will be limited by transfer speeds in the end no matter what but luckily the backups we do have are already compressed which means if someone has a 1TB plan and only uses 100GB then it would be approximately 10 times faster minus the decompression time.

These are large nodes and we're doing all we can to reach physical maximums.

They're getting loaded in at 4Gbps right now on one end (NYCB004) and that doesn't include any benefits from compression. On another end point (LAXA004) it's only going 1Gbps but that's because they are already highly compressed and the current bottleneck is decompression, we're at maximum I/O usage as a result here. The first one completed on this was 7GB out of 465GB compressed so these should still go by a lot faster but of course there will also be the ones that are 100% full and 4TB.

ralf · July 2022

@nvme said:
People will loose millions.

Is that Mi or M?

yoursunny · July 2022

Tickets will be merged.

dahartigan · July 2022

Tickets will be answered only between 1pm and 2pm on Mondays.

VirMach · July 2022

@yoursunny said:
Tickets will be merged.

Good reminder, thank you

risharde · July 2022

Thanks @VirMach appreciate the direct response!

jason5545 · August 2022

Late update: my VPS is back online, and the performance is much better than the Xeon one.

Howdy, Stranger!

Categories

In this Discussion

[Emergency Migration] Your service on NYHDD10GKVM2

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

[Emergency Migration] Your service on NYHDD10GKVM2

Comments