CloudFlare down for 20-25mins just now

deank · July 2019

Ha, even Steam is using them, it seems.

BlaZe · July 2019

OMG I just lost $69million! damm!

SirFoxy · July 2019

IRAN NUMBER THREE

jsg · July 2019

@ITLabs said:

@eLohkCalb said:
https://blog.cloudflare.com/cloudflare-outage/

Someone deployed something worldwide. And there goes WW3.

bad software deploy = php5.2 selecter ok?

Stop painting CF so negatively. They of course use js selecter - and a high-end customi [sorry, broken by CF js selecter]

ITLabs · July 2019

@SirFoxy said:
IRAN NUMBER THREE

Kazakhstan?

donli · July 2019

@ITLabs said:

@SirFoxy said:
IRAN NUMBER THREE

Kazakhstan?

Best Potassium.

datanoise · July 2019

IMO this shows two things:

PHP SELECTER > JS SELECTER. OK?
CF centralization consumes way too much Potassium.

datanoise · July 2019

sin said: It's pretty crazy how many websites I was trying to visit that were down

Yeah. Even if some sites weren't down, their static assets were. Or media elements integrated in their pages. That lets us see that CF spying is much more efficient (and harder to block/limit) than google's, and totally ignored by most people. Strange times.

SirFoxy · July 2019

@ITLabs said:

@SirFoxy said:
IRAN NUMBER THREE

Kazakhstan?

NO BORAT NEVER ALLOW BORAT KING

NORTH KOREA NUMBER FOUR

sin · July 2019

Just remember they have a NTP server now too time.cloudflare.com

donli · July 2019

@sin said:
Just remember they have a NTP server now too time.cloudflare.com

And we have: pool.ntp.org

willie · July 2019

See cest pit: it was a software deployment error.

https://blog.cloudflare.com/cloudflare-outage/

"This was not an attack (as some have speculated) and we are incredibly sorry that this incident occurred. Internal teams are meeting as I write performing a full post-mortem to understand how this occurred and how we prevent this from ever occurring again."

jsg · July 2019

@willie

And stupid me thought that a major global corporation had some procedures (incl. review, simulation, testing, ...) in place instead of allowing interns to play with millions of web sites ...

Well, I'll confess it: I should have know that that expectation was stupid in the case of CloudF_ckup.

sureiam · July 2019

Earlier today, a bad software deploy of the Cloudflare WAF caused a CPU spike globally and 502 errors for our customers. The bad deploy has been rolled back and customer traffic is flowing normally.

This is just outrageous. I could understand a deployment mucking up a regional area with low traffic but to deploy something on a global scale from the get go was just real dumb...

jsg · July 2019

OBVIOUSLY CloudF_ckup does not even have rollback functionality in place.

willie · July 2019

They did roll it back, according to their blog post.

Neoon · July 2019

@willie said:
They did roll it back, according to their blog post.

A rollback should not take 25 minutes, you should notice issues and push that button after a few minutes and it should normalize again.

donli · July 2019

@Neoon said:

@willie said:
They did roll it back, according to their blog post.

A rollback should not take 25 minutes, you should notice issues and push that button after a few minutes and it should normalize again.

That's what the guy who hit the AZ5 button thought would happen.

jsg · July 2019

@willie said:
They did roll it back, according to their blog post.

Kind of. A real rollback mechanism allows to "click a button" and be done. What they did was by hand after going and sieving though lots of potential trouble candidates. That is not what I was talking about.

Neoon · July 2019

@jsg said:
And stupid me thought that a major global corporation had some procedures (incl. review, simulation, testing, ...) in place instead of allowing interns to play with millions of web sites ...

Well, when I mostly worked as intern, I got most of them time full access to the productive database.

Welcome to the reality, even testing code, using continuous integration and have code reviews, do not protect you from loosing money.

We had one case, where parts of the project had no git, yet in 2018.
We had ci and code reviews, also tested everything that we COULD test, locally.

We could not test, something locally, because the test environment was in maintenance since xx weeks, so they gave us access to their production environment to do tests on it, yes...

They also refused to whitelist our IP, so we where forced to even run these tests on our production environment.

Additionally, the code was not in git, which made code reviews complicated.
Guess what, a single line was not changed after deployment and the company could have lost about 50k if it went worse 100k+

Luckly, we found it before we lost money.

dahartigan · July 2019

@donli said:

@ITLabs said:

@SirFoxy said:
IRAN NUMBER THREE

Kazakhstan?

Best Potassium.

Confirmed.

Cloudflare is inferior potassium, not in great nation Kazakhstan.

vpsGOD · July 2019

http://prntscr.com/o9sr6u

Dear Cloudflare Customer,

Today at approximately 13:42 UTC we experienced a global service disruption that affected most Cloudflare traffic for 27 minutes.

The issue was triggered by a bug in a software deploy of the Cloudflare Web Application Firewall (WAF) which resulted in a CPU usage spike globally, and 502 errors for our customers. To restore global traffic we temporarily disabled certain WAF capabilities, removed the underlying software bug, then verified and re-enabled all WAF services.

We’re deeply sorry about how this disruption has impacted your services. Our engineering teams continue to investigate this issue and we will be sharing detailed incident report(s) on the Cloudflare blog.

~The Cloudflare Team

sirluis · July 2019

There's no cheap alternative

cybertech · July 2019

Faith in cloudflare restored

yongsiklee · July 2019

@armandorg said:
Even let was down...

Cloudflare is internet, if cloudflare is down, Internet is down.

I decided to not use clouflare when everyone seemed to be using it.

Levi · July 2019

They provided pathetic post-mortem of incident. They did not included that nasty regex

ricardo · July 2019

Turns out Kim Kardashian just needed a regex and not her arse to #breaktheinternet

geek2009 · July 2019

@sureiam said:
https://downdetector.com/status/cloudflare
https://www.cloudflarestatus.com/incidents/tx4pgxs6zxdr

So CloudFlare had another hiccup, seems to be limited to just their caching service. These types of issues seem to becoming more common IMO. For those using them for their DNS, CDN, and registrar how do you sleep at night?

My favorite part? HetrixTools, downdetector.com, etc. all running on cloudflare so You can't even access the typical sites to confirm something is up.

no big surprice.OVH down before.

tsoft · July 2019

Now VMHaus is down.... For what sins....🧟‍♂️

Neoon · July 2019

@tsoft said:
Now VMHaus is down.... For what sins....🧟‍♂️

Nah, they had issues yesterday but not today.

Howdy, Stranger!

Categories

In this Discussion

CloudFlare down for 20-25mins just now

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

CloudFlare down for 20-25mins just now

Comments