[RESOLVED] Lunanode Toronto down
Currently an outage is affecting Lunanode (lunanode.com website and VMs hosted in Toronto).
Update: https://status.lunanode.com/
Update 10 (06 July 2020 14:20 EDT): all three hypervisors and volume storage system remain offline. We continue to work to resolve the issues but do not currently have ETA on resolution.
Update 9 (06 July 2020 14:00 EDT): three hypervisors remain offline due to hardware failure after power surge: ceac64db9351, cbec3404b692, and b1fda546d1e3. Volume storage system remains impacted and may show timeout operations on some volumes. If your service is not on one of those hypervisors and does not use volumes, but remains offline, please open support ticket.
Update 8 (06 July 2020 13:50 EDT): if you see your VM is online in VNC but not reachable, please try restart, and if it does not work please open ticket.
Update 7 (06 July 2020 13:16 EDT): we have resolved the issue on one of the three failed storage nodes and it is coming online now. We are working on the other two nodes but the first one should be sufficient to bring entire storage system online.
Update 6 (06 July 2020 12:48 EDT): most services are back online but three storage nodes are offline which means distributed storage system is offline so VMs with volume cannot be booted until we resolve this issue.
Update 5 (06 July 2020 11:50 EDT): controller node is booted.
Update 4 (06 July 2020 11:40 EDT): no replacement needed after removing serial port connection. ETA 10 minutes until most services except three hypervisors are online.
Update 3 (06 July 2020 11:35 EDT): some servers are damaged due to power surge. We need to replace it with backup server.
Update 2 (06 July 2020 11:05 EDT): PDU fail due to datacenter power surge. Replace and now services coming back online soon.
Update 1 (06 July 2020 10:20 EDT): appears to be power issue. Our technicians arrive on-site shortly.
Many services in Toronto are down. We are investigating.
Comments
Waiting for Toronto to come back
Ouch, they got hit pretty hard by that power surge. Thanks to @perennate for the hard work and immediate response.
VMs are online but outside network is disconnected.
So no data loss, apparently. That's the good part.
I wonder if their routers got damaged too? Or some other piece of networking equipment at the datacenter outside of Lunanode's control?
@perennate It appears that a router or something is still offline, because my VM is up but still not accessible except via the VNC console. lunanode.com is also still inaccessible.
Thanks for your fast response.
@perennate Something is definitely wrong with the connectivity, my VM is sporadically on and offline now.
Replying to this thread constantly isn't going to help, take note of 'most services are back online' which straight away indicates there are still some problems, which is to be expected. Give them chance to resolve the issue properly and keep an eye on the status updates, that's all you can do.
This thread is helpful as people can wait with switching back to Toronto and operate from their backup nodes so far. My Toronto nodes are still yellow and not connected to the internet.
https://dynamic.lunanode.com/panel/ is working
https://status.lunanode.com/
For awhile my VM was up, but not accessible via IPv4, although IPv6 was working. Now both IPv4 and IPv6 are working correctly, and lunanode.com is back up.
The Panel never went down, it must be on a redundant infrastructure. Their ticketing system and some other functions inside the panel were broken though. Currently snapshots and volumes are still not working, as explained.
Not so lucky with b1fda546d1e3.
Bummer!!! I hope your data is still OK. Would a power surge normally fry an SSD?
Websites are working fine from a backup node on another provider except some very fresh data. Power surges can do many nasty things to motherboard and components.
All is back! Checking that everything works before switching websites back to Toronto. Thanks!
What a day for them
No kidding. They handled it super well though, kudos to Lunanode.
Email updates and a credit offered!
Brilliantly done.
Thanks @perennate
In contrast:
Yesterday another provider had a hardware upgrade, the 1 hour announced downtime went to 4 plus hours. When I raised a ticket, I was asked to check on the status page (which only showed red/amber/green colours).
Know now which provider to renew with!
Kudos @perennate for the way that was handled! :-)
100%
Damnit surge protector, you had one job to do!
It’s not fixed yet. I’m getting downtime alert multiple times every hour for the whole day ☹️
@sayem314 the outage yesterday was resolved and only affected Toronto. If you have an ongoing issue then please open a support ticket. We are not seeing packet loss in Toronto, Montreal, or Roubaix at this time.
It was actually an issue with my application (memory leak causing the app to crash and restart again after few minutes) 😪 Opened a ticket and Lunanode was very fast and helpful. Appreciate the support 🙏