Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


[Emergency Migration] Your service on NYHDD10GKVM2
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

[Emergency Migration] Your service on NYHDD10GKVM2

jason5545jason5545 Member
edited July 2022 in Outages

Just received this from virmach. I guess it's my turn.

Dear VirMach Customers,

Your storage service was located on a server that unfortunately is having hardwre issues. We apologize for the delayed email in this case but we have been working hard on deploying a suitable replacement server and noticed the situation escalate today. Services were powered off to reduce chances of further hardware issue, and to increase the speed of the migration since this is a storage server.

While preparing for migrations to a new node as a part of our Ryzen upgrade program, we performed routine backups and noticed that during the backup process, two disks on the server's RAID10 controller malfunctioned. Luckily, this was caught in time and we happened to be planning the hardware refresh, so it will now be prioritized and your data is being moved to the brand new server, with much newer processor, hard disks, and RAID controller. Since we were backing this server up in preparation for the final upcoming migration, we've pushed the migration to now and we also luckily have backups of most customers' data over the last week in case of emergency. We will attempt to migrate the bulk of services in a manner to ensure quickest service restoration but please understand that this is a large storage node so the outage may be longer than regular services.

We would rather not risk data loss by requesting the datacenter attempt to replace two disks (to avoid the chance of human error) and very long RAID re-learning process.

-VirMach

Thanked by 1FrankZ

Comments

  • risharderisharde Patron Provider, Veteran

    Confirming receipt. Hopefully they do what they say and it doesn't take weeks to get back up and running.

    Thanked by 1jason5545
  • Update

    Hello,

    We sent a previous email describing the process we're going through for your storage service. Unfortunately, the server being in a degraded state is slowing down the process and as a result, we are going to begin loading in disaster recovery backups for older services to the closest available location, in Los Angeles.

    We will have two queues going simultaneously:

    Newest to oldest services (meaning you activated your service more recently) will go from the degraded server to NYCB004S (new NYC storage server.)
    Oldest to newest services (meaning you've had the service for a while) will go from backup node to LAXA004S (new LAX storage server.)
    If your service ends up in LAX, or if the backup data is too old, you can contact us in a ticket to have us attempt a new sync from the degraded node to NYCB004S instead. Please wait until the information is updated, and then you can create a ticket in the priority department called "LAXA004S to NYCB004S" if you need a location change and "NYCB004S Resync" if you require data to be moved again. For the people in between these two queues, it is possible you will end up with two copies, both on LAXA004S and NYCB004S, and one or the other will be assigned to you.

    This is in an effort to speed up migrations since we have port speed limitations and disk speed limitations due to RAID degradation.

    Thank you.

    Looks like not going too well.

  • Disaster recovery plans will be activated.

    Thanked by 3jason5545 FrankZ taizi
  • nvmenvme Member

    People will loose millions.

    Thanked by 2jason5545 FrankZ
  • vyas11vyas11 Member

    PMS will be in overdrive

    Thanked by 2jason5545 FrankZ
  • VirMachVirMach Member, Patron Provider

    @risharde said:
    Confirming receipt. Hopefully they do what they say and it doesn't take weeks to get back up and running.

    We will be limited by transfer speeds in the end no matter what but luckily the backups we do have are already compressed which means if someone has a 1TB plan and only uses 100GB then it would be approximately 10 times faster minus the decompression time.

    These are large nodes and we're doing all we can to reach physical maximums.

    They're getting loaded in at 4Gbps right now on one end (NYCB004) and that doesn't include any benefits from compression. On another end point (LAXA004) it's only going 1Gbps but that's because they are already highly compressed and the current bottleneck is decompression, we're at maximum I/O usage as a result here. The first one completed on this was 7GB out of 465GB compressed so these should still go by a lot faster but of course there will also be the ones that are 100% full and 4TB.

  • ralfralf Member

    @nvme said:
    People will loose millions.

    Is that Mi or M?

  • yoursunnyyoursunny Member, IPv6 Advocate

    Tickets will be merged.

    Thanked by 1jason5545
  • Tickets will be answered only between 1pm and 2pm on Mondays.

    Thanked by 1gzz
  • VirMachVirMach Member, Patron Provider

    @yoursunny said:
    Tickets will be merged.

    Good reminder, thank you

    Thanked by 1ralf
  • risharderisharde Patron Provider, Veteran

    Thanks @VirMach appreciate the direct response!

  • Late update: my VPS is back online, and the performance is much better than the Xeon one.

    Thanked by 2Not_Oles BBTN
Sign In or Register to comment.