Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
Home › Providers › Outages
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Holy Smokes Linode/Akamai

SplitIceSplitIce Member, Host Rep
edited July 2025 in Outages

This is why you don't put your failover PoP with Linode/Akamai if your primary is too. Even if they are in different Datacenters on other sides of the world.

Impact to LKE services has been confirmed to have also extended to our data centers in Dallas, Fremont, Sydney, Tokyo 2, Toronto and Washington due to the interaction with our data center in Newark. We continue to work bringing our services back online, and we will provide an update as soon as progress is made.

https://status.linode.com/incidents/6yw88b0ft94g (5 hours and counting)

Fortunately we run backup services for critical elements (i.e networking, mitigation analyzers etc) with 2 other providers (3 regions). Didnt expect that from such a big player. Feeling very glad we went with "Deploy less but work with more vendors" as opposed to "Deploy an exact copy in a different region with the same vendor" right about now which were both options when the work was being scoped.

Comments

  • JoshRJoshR Member, Patron Provider

    In a way no difference than outages from Gcloud, AWS, AZURE. The big clouds have them just as well. AND it seems like when the big clouds have them it effects more than one place.

  • As you may know, according to the status page, there are a lot of issues affecting a lot of regions.
    I'm considering moving out from them.

  • SplitIceSplitIce Member, Host Rep

    @JoshR I am aware its not unique to Akamai. But they do have an irritatingly sparing style of status page updates and a way of ignoring a single ticket (I raised the issue prior to the incident, its still open 19 hours later with no response)

    Glad I decided do 3 hour sleeps between checks last night, it was no quick issue.

    @Ariang I'm definately considering it too (it won't be an overnight thing).

    I happen to know theres a pretty big unpatched bug in their LKE implementation too. Acknowledged by them. Patch developed. Got months of monthly updates letting me know that its not being put into the next release, but maybe the next. And its the kind of issue that anyone running a moderately sized LKE cluster may find (especially if you run many smaller nodes with a decent number of pods).

    The clusters I manage are not small accounts either. One account could easily pay an Australian salary.

    We are about half up (half the LKE nodes) currently. The other half appear up but without internal networking. Not sure if its a split brain situation (authentication systems are on one side).

  • EWR has been down for > 12 hours, I wonder what's going on. Based on their status updates the initial power outage was fixed, but it seems like their cooling systems are still dead?

    I guess it'll take much longer than 12 hours to fix cooling...

    Thanked by 1slmdr
  • MikeAMikeA Member, Patron Provider

    Quite surprising for a company of that size and revenue ($4 billion 2024.) Seems like some lower end commercial datacenters are setup better in terms of redundancy and backup hardware/facilities. But maybe it's something insane. Would be cool to read about if they were to write a blog post afterwards like CloudFlare does in downtime events.

    Thanked by 3oloke mrTom tentor
  • SplitIceSplitIce Member, Host Rep

    The fact that they blame cooling and power is interesting considering the servers I do have access to have not been rebooted.

    It will be 24 hours offline shortly I expect.

  • caasifycaasify 🚩 Patron Provider Tag Suspended

    @SplitIce said:
    This is why you don't put your failover PoP with Linode/Akamai if your primary is too. Even if they are in different Datacenters on other sides of the world.

    Impact to LKE services has been confirmed to have also extended to our data centers in Dallas, Fremont, Sydney, Tokyo 2, Toronto and Washington due to the interaction with our data center in Newark. We continue to work bringing our services back online, and we will provide an update as soon as progress is made.

    https://status.linode.com/incidents/6yw88b0ft94g (5 hours and counting)

    Fortunately we run backup services for critical elements (i.e networking, mitigation analyzers etc) with 2 other providers (3 regions). Didnt expect that from such a big player. Feeling very glad we went with "Deploy less but work with more vendors" as opposed to "Deploy an exact copy in a different region with the same vendor" right about now which were both options when the work was being scoped.

    To avoid issues like the recent Linode outage affecting multiple regions, you can use Caasify, a centralized platform that lets you deploy VPS instances across 81+ data centers from providers like Linode, DigitalOcean, Vultr, Hetzner, and more, all through a single account. This way, you can easily build a multi-vendor, multi-region infrastructure without the hassle of managing separate accounts on each platform, helping you reduce the risk of vendor-wide outages affecting your services.

  • RubbenRubben Member

    @caasify said:

    @SplitIce said:
    This is why you don't put your failover PoP with Linode/Akamai if your primary is too. Even if they are in different Datacenters on other sides of the world.

    Impact to LKE services has been confirmed to have also extended to our data centers in Dallas, Fremont, Sydney, Tokyo 2, Toronto and Washington due to the interaction with our data center in Newark. We continue to work bringing our services back online, and we will provide an update as soon as progress is made.

    https://status.linode.com/incidents/6yw88b0ft94g (5 hours and counting)

    Fortunately we run backup services for critical elements (i.e networking, mitigation analyzers etc) with 2 other providers (3 regions). Didnt expect that from such a big player. Feeling very glad we went with "Deploy less but work with more vendors" as opposed to "Deploy an exact copy in a different region with the same vendor" right about now which were both options when the work was being scoped.

    To avoid issues like the recent Linode outage affecting multiple regions, you can use Caasify, a centralized platform that lets you deploy VPS instances across 81+ data centers from providers like Linode, DigitalOcean, Vultr, Hetzner, and more, all through a single account. This way, you can easily build a multi-vendor, multi-region infrastructure without the hassle of managing separate accounts on each platform, helping you reduce the risk of vendor-wide outages affecting your services.

    I’m sure OP wants to use some random noname reseller šŸ˜† what a shameless ad plug

  • @MikeA said:
    Quite surprising for a company of that size and revenue ($4 billion 2024.) Seems like some lower end commercial datacenters are setup better in terms of redundancy and backup hardware/facilities. But maybe it's something insane. Would be cool to read about if they were to write a blog post afterwards like CloudFlare does in downtime events.

    Linode was only recently acquired by Akamai, and Newark is one of their oldest DCs, so it dates back to way before Akamai became involved in the picture.

  • SocheatSocheat Member
    edited July 2025

    @SplitIce said:
    @JoshR I am aware its not unique to Akamai. But they do have an irritatingly sparing style of status page updates and a way of ignoring a single ticket (I raised the issue prior to the incident, its still open 19 hours later with no response)

    That doesn't sit well with the big player like Linode/Akamai. Left the ticket unanswered for 19 hours? Wow. At least some LET providers are better when it comes to support.

  • MikeAMikeA Member, Patron Provider

    @ehhthing said:

    @MikeA said:
    Quite surprising for a company of that size and revenue ($4 billion 2024.) Seems like some lower end commercial datacenters are setup better in terms of redundancy and backup hardware/facilities. But maybe it's something insane. Would be cool to read about if they were to write a blog post afterwards like CloudFlare does in downtime events.

    Linode was only recently acquired by Akamai, and Newark is one of their oldest DCs, so it dates back to way before Akamai became involved in the picture.

    I'm aware, but they've owned them for a few years now.

  • SplitIceSplitIce Member, Host Rep

    @caasify said:
    To avoid issues like the recent Linode outage affecting multiple regions, you can use Caasify, a centralized platform that lets you deploy VPS instances across 81+ data centers from providers like Linode, DigitalOcean, Vultr, Hetzner, and more, all through a single account. This way, you can easily build a multi-vendor, multi-region infrastructure without the hassle of managing separate accounts on each platform, helping you reduce the risk of vendor-wide outages affecting your services.

    Most peoples issues with redundnancy is not managing a couple accounts, its:

    1. configuring software and designing architecture to support golbally deployment (e.g database server)
    2. cost, a global / multi vendor deployment just costs more
    Thanked by 2BasToTheMax lmonaro
  • SplitIceSplitIce Member, Host Rep

    Wow,

    Mitigation efforts are continuing with our subject matter experts actively working to restore the remaining services. We will continue to provide updates as progress is made.

    Last 3 updates are all the same, thats 3 times in a row for the past 7 hours.

  • @SplitIce said:
    Wow,

    Mitigation efforts are continuing with our subject matter experts actively working to restore the remaining services. We will continue to provide updates as progress is made.

    Last 3 updates are all the same, thats 3 times in a row for the past 7 hours.

    Just the usual corporate, PR-friendly kind of update :)

  • SplitIceSplitIce Member, Host Rep

    Also its not mentioned on the status page but someone else I know who manages a cluster on Linode got a data loss email.

    So at-least some people have lost data. It might be a good idea for people to keep your backups at hand and take the time to manually verify. I'm definately pulling my (remote) backups (even if its paranoia).

  • It's impacting multiple services. It seems they make an update, then something goes wrong.

  • SplitIceSplitIce Member, Host Rep

    Good news! 4/12 of our Kubnernetes nodes are functional.

    Maybe we should spin up replacement nodes? Nope new nodes are non functional from boot.

    But don't worry the issue is considered resolved by Akamai. Celebrate.

    Thanked by 1cu_olly
  • SplitIceSplitIce Member, Host Rep

    Good news 12/12 Kubernetes nodes are up it seems.

    No ticket update, lets not touch anything.

  • jsgjsg Member, Resident Benchmarker

    But, but! ... who could have known that propagating configs throughout a global network would also propagate mistakes and errors?!

    The culprits obviously are the evil fibers and routers who just don't care about what's in the packets they transport! Shame on them!

    Akamai is completely totally innocent (as multi-billion corporations usually are) as their PR uhm, I mean statements (as well as absence thereof) clearly demonstrate.

    (in a 3pt Arial footnote, light grey on white: "A few customers might have experienced some minor issues, which our CEO will personally investigate and fix. Out of abundance of caution we do reject any and all responsibility for what may or may not have happened. We appreciate your understanding")

  • NeoonNeoon Community Contributor, Veteran

Sign In or Register to comment.