All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Holy Smokes Linode/Akamai
This is why you don't put your failover PoP with Linode/Akamai if your primary is too. Even if they are in different Datacenters on other sides of the world.
Impact to LKE services has been confirmed to have also extended to our data centers in Dallas, Fremont, Sydney, Tokyo 2, Toronto and Washington due to the interaction with our data center in Newark. We continue to work bringing our services back online, and we will provide an update as soon as progress is made.
https://status.linode.com/incidents/6yw88b0ft94g (5 hours and counting)
Fortunately we run backup services for critical elements (i.e networking, mitigation analyzers etc) with 2 other providers (3 regions). Didnt expect that from such a big player. Feeling very glad we went with "Deploy less but work with more vendors" as opposed to "Deploy an exact copy in a different region with the same vendor" right about now which were both options when the work was being scoped.


Comments
In a way no difference than outages from Gcloud, AWS, AZURE. The big clouds have them just as well. AND it seems like when the big clouds have them it effects more than one place.
As you may know, according to the status page, there are a lot of issues affecting a lot of regions.
I'm considering moving out from them.
@JoshR I am aware its not unique to Akamai. But they do have an irritatingly sparing style of status page updates and a way of ignoring a single ticket (I raised the issue prior to the incident, its still open 19 hours later with no response)
Glad I decided do 3 hour sleeps between checks last night, it was no quick issue.
@Ariang I'm definately considering it too (it won't be an overnight thing).
I happen to know theres a pretty big unpatched bug in their LKE implementation too. Acknowledged by them. Patch developed. Got months of monthly updates letting me know that its not being put into the next release, but maybe the next. And its the kind of issue that anyone running a moderately sized LKE cluster may find (especially if you run many smaller nodes with a decent number of pods).
The clusters I manage are not small accounts either. One account could easily pay an Australian salary.
We are about half up (half the LKE nodes) currently. The other half appear up but without internal networking. Not sure if its a split brain situation (authentication systems are on one side).
EWR has been down for > 12 hours, I wonder what's going on. Based on their status updates the initial power outage was fixed, but it seems like their cooling systems are still dead?
I guess it'll take much longer than 12 hours to fix cooling...
Quite surprising for a company of that size and revenue ($4 billion 2024.) Seems like some lower end commercial datacenters are setup better in terms of redundancy and backup hardware/facilities. But maybe it's something insane. Would be cool to read about if they were to write a blog post afterwards like CloudFlare does in downtime events.
The fact that they blame cooling and power is interesting considering the servers I do have access to have not been rebooted.
It will be 24 hours offline shortly I expect.
To avoid issues like the recent Linode outage affecting multiple regions, you can use Caasify, a centralized platform that lets you deploy VPS instances across 81+ data centers from providers like Linode, DigitalOcean, Vultr, Hetzner, and more, all through a single account. This way, you can easily build a multi-vendor, multi-region infrastructure without the hassle of managing separate accounts on each platform, helping you reduce the risk of vendor-wide outages affecting your services.
Iām sure OP wants to use some random noname reseller š what a shameless ad plug
Linode was only recently acquired by Akamai, and Newark is one of their oldest DCs, so it dates back to way before Akamai became involved in the picture.
That doesn't sit well with the big player like Linode/Akamai. Left the ticket unanswered for 19 hours? Wow. At least some LET providers are better when it comes to support.
I'm aware, but they've owned them for a few years now.
Most peoples issues with redundnancy is not managing a couple accounts, its:
Wow,
Last 3 updates are all the same, thats 3 times in a row for the past 7 hours.
Just the usual corporate, PR-friendly kind of update
Also its not mentioned on the status page but someone else I know who manages a cluster on Linode got a data loss email.
So at-least some people have lost data. It might be a good idea for people to keep your backups at hand and take the time to manually verify. I'm definately pulling my (remote) backups (even if its paranoia).
It's impacting multiple services. It seems they make an update, then something goes wrong.
Good news! 4/12 of our Kubnernetes nodes are functional.
Maybe we should spin up replacement nodes? Nope new nodes are non functional from boot.
But don't worry the issue is considered resolved by Akamai. Celebrate.
Good news 12/12 Kubernetes nodes are up it seems.
No ticket update, lets not touch anything.
But, but! ... who could have known that propagating configs throughout a global network would also propagate mistakes and errors?!
The culprits obviously are the evil fibers and routers who just don't care about what's in the packets they transport! Shame on them!
Akamai is completely totally innocent (as multi-billion corporations usually are) as their PR uhm, I mean statements (as well as absence thereof) clearly demonstrate.
(in a 3pt Arial footnote, light grey on white: "A few customers might have experienced some minor issues, which our CEO will personally investigate and fix. Out of abundance of caution we do reject any and all responsibility for what may or may not have happened. We appreciate your understanding")