Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Crunchbits VDS node outage
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Crunchbits VDS node outage

My VDS node was been down at 3:29 AM PDT time. I opened a technical ticket at 5:15 AM PDT time. They responded at 12:53 PM and restored my server at 3:37 PM.
This resulted in a total of ~12 hours outage time.
I know that this is not a normal response time given that https://get.crunchbits.com/announcements/17/NOTICE-Support-times-over-the-next-48-hours.html.
Is anyone here on the same node as me? Am I the only one having this issue?
The final message I got from support was

Hello,

Your vps should be up right now.

Thanks [Redacted]

I was hoping for a bit more of an explanation for the outage though.

«1

Comments

  • yoursunnyyoursunny Member, IPv6 Advocate

    Your service is up right now, so stop crying.

    Thanked by 2Hakim DeadlyChemist
  • Which VDS plan is yours? My 7GB Xeon VDS has been running fine.

  • @nick_ said:
    Which VDS plan is yours? My 7GB Xeon VDS has been running fine.

    For me its a 16GBs Ryzen VDS

    Thanked by 1nick_
  • @yoursunny said:
    Your service is up right now, so stop crying.

    very helpful thanks for your contribution to the discussion.

  • @rjbl said:

    @yoursunny said:
    Your service is up right now, so stop crying.

    very helpful thanks for your contribution to the discussion.

    I mean

    One server was obviously down, what do you gain from knowing exactly who on LET was on that one server? Especially now that it’s up?

  • @emgh said:

    @rjbl said:

    @yoursunny said:
    Your service is up right now, so stop crying.

    very helpful thanks for your contribution to the discussion.

    I mean

    One server was obviously down, what do you gain from knowing exactly who on LET was on that one server? Especially now that it’s up?

    I am hoping that it wasn't my configuration that caused my own outage.

    Thanked by 1emgh
  • @rjbl said:

    @emgh said:

    @rjbl said:

    @yoursunny said:
    Your service is up right now, so stop crying.

    very helpful thanks for your contribution to the discussion.

    I mean

    One server was obviously down, what do you gain from knowing exactly who on LET was on that one server? Especially now that it’s up?

    I am hoping that it wasn't my configuration that caused my own outage.

    Well what did they respond with?

    If it was unclear, just reply back and ask if they had an issue or if it was only your VPS having issues

    Thanked by 1rjbl
  • Ok, I received this message:

    Hey [Reacted],

    We do, and I apologize for the delay and downtime on this node. Our Ryzen products all have quad-9s SLA (99.99%) and we will process service credits once we close the case out on our side and are confident no additional work/downtime is needed. There was a power supply issue which required us to unrack and complete emergency repair work (and now we're monitoring).

    To give you a full transparent answer: we also had a notification issue where we found our alerts for this server (and another provisioned on the same day) were not correctly tagged and thus did not trigger the emergency alert. It caused a delay on our part in responding to the downed server which would otherwise be a sub-5 minute response. It is completely on us, but just thought you were owed a full explanation.

    I've assigned your ticket and added a note about SLA which we will process 9/4 to 9/5 assuming we are confident in the repairs and close out the ticket status. Additionally, we're pushing out a private status page for customers to see all of our node statuses in real-time which will also allow us to add notes and inform each of you better from a central source. I believe from the point of view of a customer, just seeing overall facility and network status was not enough.

    Best,
    [Reacted]

  • @rjbl said:
    Ok, I received this message:

    Hey [Reacted],

    We do, and I apologize for the delay and downtime on this node. Our Ryzen products all have quad-9s SLA (99.99%) and we will process service credits once we close the case out on our side and are confident no additional work/downtime is needed. There was a power supply issue which required us to unrack and complete emergency repair work (and now we're monitoring).

    To give you a full transparent answer: we also had a notification issue where we found our alerts for this server (and another provisioned on the same day) were not correctly tagged and thus did not trigger the emergency alert. It caused a delay on our part in responding to the downed server which would otherwise be a sub-5 minute response. It is completely on us, but just thought you were owed a full explanation.

    I've assigned your ticket and added a note about SLA which we will process 9/4 to 9/5 assuming we are confident in the repairs and close out the ticket status. Additionally, we're pushing out a private status page for customers to see all of our node statuses in real-time which will also allow us to add notes and inform each of you better from a central source. I believe from the point of view of a customer, just seeing overall facility and network status was not enough.

    Best,
    [Reacted]

    This is an excellent answer, good job @crunchbits on the transparency!

  • crunchbitscrunchbits Member, Patron Provider, Top Host
    edited September 2023

    @rjbl said:
    I was hoping for a bit more of an explanation for the outage though.

    Just now had this linked to me but you should already have a reply from me earlier. Admins and staff had already fixed the issue but it required a proper reply from myself directly. If you wish you're welcome to share it.

    edit: see it shared above before I refreshed

    @emgh said:
    Well what did they respond with?

    If it was unclear, just reply back and ask if they had an issue or if it was only your VPS having issues

    It was a node-specific hardware issue. We were admittedly delayed in responding to it.

  • My node is down again for the past few minutes.

  • @rjbl said:
    My node is down again for the past few minutes.

    I just ate a burger, but I'm finished now

    Thanked by 1fluffernutter
  • I really like their hardware and network though. Does their dedicated servers have better uptime?

  • Don_KeedicDon_Keedic Member
    edited September 2023

    @rjbl said:
    I really like their hardware and network though. Does their dedicated servers have better uptime?

    I've had maybe 2-4 minutes of downtime on the storage box I've had with them since Feb/March and that was due to them changing an IP range.

    I've got a 4gb VPS I've had for about a month and a half and a VDS (7950x) I picked up last week - zero issues, no downtime.

    Thanked by 1rjbl
  • yoursunnyyoursunny Member, IPv6 Advocate

    @emgh said:

    @rjbl said:
    My node is down again for the past few minutes.

    I just ate a burger, but I'm finished now

    I just ate the avocado bacon burger from Shake Shack.
    I'm in a food coma now.

    IMG_6840

  • @Don_Keedic said:

    @rjbl said:
    I really like their hardware and network though. Does their dedicated servers have better uptime?

    I've had maybe 2-4 minutes of downtime on the storage box I've had with them since Feb/March and that was due to them changing an IP range.

    I've got a 4gb VPS I've had for about a month and a half and a VDS (7950x) I picked up last week - zero issues, no downtime.

    Maybe I am on an problematic node. I bought them from the LES sale so around a month and half too. I logged the down time at 12.47 PM PST.

  • Is it back up? @rjbl

  • @emgh said:
    Is it back up? @rjbl

    Not yet

    Thanked by 1emgh
  • @yoursunny said: I just ate the avocado bacon burger from Shake Shack.

    I'm in a food coma now.

    Lucky they don't have them where I live.

    Thanked by 1emgh
  • Its up now. I am waiting for node migration.
    12 + 2 hours is a new record of unplanned downtime from a provider for me.

  • bethpbethp Member, Host Rep

    I mean, better than deadpool ... shit sometimes happens and they seem to be handling this in a very professional way, nothing can ever be perfect no matter how ard we try, if they honour the SLA they offer then in my books you have a good host.

    Advice for in future, have a backup vm that has a direct copy if It's critical to be online and load balance :)

    Thanked by 2yoursunny rjbl
  • Don_KeedicDon_Keedic Member
    edited September 2023

    @rjbl said:
    Its up now. I am waiting for node migration.
    12 + 2 hours is a new record of unplanned downtime from a provider for me.

    Well sorry to hear that. I'm certain they'll get everything back up and running for you, even on a holiday!

    Thanked by 1rjbl
  • Sigh, losing many hours of work is not fun.

  • Offsite backups are important.

    4 days ago there was a 13 hour 46 minute outage in Atlanta (USA) from a known provider. I waited the first hour, then with the offsite copy (5 websites and 5 databases), I migrated to another provider also with instant activation and in 45 minutes I was online ⚡ :)

    Thanked by 2Not_Oles rjbl
  • yoursunnyyoursunny Member, IPv6 Advocate

    @edrebe said:
    Offsite backups are important.

    4 days ago there was a 13 hour 46 minute outage in Atlanta (USA) from a known provider. I waited the first hour, then with the offsite copy (5 websites and 5 databases), I migrated to another provider also with instant activation and in 45 minutes I was online ⚡ :)

    Same happened to my website in 2021:
    yoursunny.com Disaster Recovery Plan: 104 Minutes Downtime, No Tears
    Recovery time would be much longer if I'm sleeping or geocaching though.

    Thanked by 3Not_Oles rjbl ariq01
  • @rjbl said:
    Sigh, losing many hours of work is not fun.

    you should go with high availability vps.

  • I did duplicate data after what happened yesterday but recent data is inaccessible until the VDS came up again.

    @cybertech said:

    @rjbl said:
    Sigh, losing many hours of work is not fun.

    you should go with high availability vps.

    Unless you setup a HA cluster, I am not sure what you mean by high availability vps?

  • @rjbl said:
    I did duplicate data after what happened yesterday but recent data is inaccessible until the VDS came up again.

    @cybertech said:

    @rjbl said:
    Sigh, losing many hours of work is not fun.

    you should go with high availability vps.

    Unless you setup a HA cluster, I am not sure what you mean by high availability vps?

    https://www.clouvider.com/cloud-vps/

    https://upcloud.com/products/cloud-servers

    Thanked by 1rjbl
  • Not_OlesNot_Oles Moderator, Patron Provider

    @rjbl said: I did duplicate data

    Hi @rjbl!

    Just duplicating your data might not be enough.

    It seems too crazy, but maybe you want to copy all your data to multiple, independent locations. Maybe you want to use different backup media and formats at the multiple locations. Then maybe you want to double check that you actually can restore your backed up data and that the restored data matches the original.

    For example, maybe back up to a local hard drive, to Google Drive, and to a server far away across the ocean.

    Maybe you might be interested to see this article: https://lowendbox.com/blog/an-incredibly-amazing-co-incidence-of-doubled-double-disk-failures/

    Best wishes!

    Tom

    Thanked by 2rjbl amaeva080
  • rjblrjbl Member
    edited September 2023

    @cybertech said:

    @rjbl said:
    I did duplicate data after what happened yesterday but recent data is inaccessible until the VDS came up again.

    @cybertech said:

    @rjbl said:
    Sigh, losing many hours of work is not fun.

    you should go with high availability vps.

    Unless you setup a HA cluster, I am not sure what you mean by high availability vps?

    https://www.clouvider.com/cloud-vps/

    https://upcloud.com/products/cloud-servers

    I did not know that these existed thanks! I am bit skeptical though. Are they using something like Ceph to duplicate live data?

    @Not_Oles said:

    @rjbl said: I did duplicate data

    Hi @rjbl!

    Just duplicating your data might not be enough.

    It seems too crazy, but maybe you want to copy all your data to multiple, independent locations. Maybe you want to use different backup media and formats at the multiple locations. Then maybe you want to double check that you actually can restore your backed up data and that the restored data matches the original.

    For example, maybe back up to a local hard drive, to Google Drive, and to a server far away across the ocean.

    Maybe you might be interested to see this article: https://lowendbox.com/blog/an-incredibly-amazing-co-incidence-of-doubled-double-disk-failures/

    Best wishes!

    Tom

    Hi Tom, thanks for advice.
    I already have my backups on E2 object storage. Though I am hoping the claimed 9, 11s durability would not have a OVH incident too given its price.

    Thanked by 1Not_Oles
Sign In or Register to comment.