Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Azure US East outage due to fiber cut
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Azure US East outage due to fiber cut

joepie91joepie91 Member, Patron Provider

So, Azure's US East location has been having issues for the past 9 hours or so, reporting 'reduced capacity' due to a fiber cut. I've seen reports of people's services being down there entirely.

You'd think that with the $87/TB they charge for network traffic, they'd at least provide some real redundancy... but apparently a single line being cut is enough to fuck shit up.

Thanked by 1rds100

Comments

  • NeoonNeoon Community Contributor, Veteran

    If you sell something, that enough people still, buy because it got CLOUD or has Azure in it, why brother putting up a additional fibre uplink?

    Let them pay $87/TB if they are stupid enough.

    The good thing on this is, I am sure some of them will switch and learn. Hopefully.

  • jsgjsg Member, Resident Benchmarker
    edited August 2018

    @joepie91

    Yes, ridiculous.

    But my guess is that they DO have multiple lines - but a single feed in. Cheap but bad bad practice. A high class DC has dual feed ins with a sufficiently large distance between (typ. opposite sides of building) AND different physical routes.

    With power feeds one can be a bit more sloppy if one has good enough backup infrastructure. But with fibers cut is cut and the only proper way is actual different physical routes redundancy.

    With 87$/TB there is no excuse for cutting corners.

  • MikeAMikeA Member, Patron Provider
    edited August 2018

    One thing Microsoft Azure and OVH have in common. One is 90% cheaper.

  • ClouviderClouvider Member, Patron Provider
    edited August 2018

    Two diverse fibres going out via two opposite sides of the building that never overlap is the minimum standard I consider ‘professional’

    Thanked by 1vimalware
  • YuraYura Member

    Single homed duh. They need that sweet CC blend.

  • i am certain those companies that need real redundancy would have implemented a high availability architecture for their databases and virtual machines to be able to scale up and route around the affected region. If course that wrecks the ability of people to offer snide comments.

  • ClouviderClouvider Member, Patron Provider

    @OmgpleaseRead said:
    i am certain those companies that need real redundancy would have implemented a high availability architecture for their databases and virtual machines to be able to scale up and route around the affected region. If course that wrecks the ability of people to offer snide comments.

    It doesn’t excuse the extremely poor network design though.

    Thanked by 3Harambe vimalware Gods
  • joepie91joepie91 Member, Patron Provider

    @OmgpleaseRead said:
    i am certain those companies that need real redundancy would have implemented a high availability architecture for their databases and virtual machines to be able to scale up and route around the affected region. If course that wrecks the ability of people to offer snide comments.

    Quite possible. Doesn't change the fact that Azure are charging $87/TB for network infrastructure that's less redundant than some of the 'one-man shows' on here.

    Thanked by 1Harambe
  • Perhaps their clients will ask for more details after the crisis is mitigated via private conversations. Like some companies here have requested in the past.

  • JohnMiller92JohnMiller92 Member
    edited August 2018

    @MikeA said:
    One thing Microsoft Azure and OVH have in common. One is 90% cheaper.

    LOL love this

  • I'd hazard a chance that most of the people who are using Azure are probably using it off their $150 per month of free Azure Credits that people get via University or from their MSDN subscription.

    Or startups who participate in their BizSpark program.

    Then again, most of the big spenders I know who use Azure don't rely on a single location for their workload. Also, to be fair, most of the services that people are using isn't necessarily compute workloads.

  • jsgjsg Member, Resident Benchmarker

    @OmgpleaseRead said:
    i am certain those companies that need real redundancy would have implemented a high availability architecture for their databases and virtual machines to be able to scale up and route around the affected region. If course that wrecks the ability of people to offer snide comments.

    Or maybe those companies chose an expensive service ("Azure") because - so they thought - it has some real redundancy.

    Thanked by 1vimalware
  • 87$/TB? Is this for real?

  • @jsg said:

    @OmgpleaseRead said:
    i am certain those companies that need real redundancy would have implemented a high availability architecture for their databases and virtual machines to be able to scale up and route around the affected region. If course that wrecks the ability of people to offer snide comments.

    Yep, and they literally give you the option to either make it GeoRedundant or Datacenter level Redundant when you create your Compute resources storage.

  • mrtzmrtz Member

    "This issue was attributed to a fiber cut caused by construction approximately 5 km from Microsoft data centers. This resulted in multiple line breaks impacting separate splicing enclosures that reduced capacity between 2 Azure regional data centers."

    "multiple line breaks" and "reduced capacity between 2 azure regional data centers". So as far as I can tell, it's not "we lost all connectivity due to a single line break".

    Thanked by 2vimalware Wolveix
  • InfinityInfinity Member, Host Rep
    edited August 2018

    @mrtz said:
    "This issue was attributed to a fiber cut caused by construction approximately 5 km from Microsoft data centers. This resulted in multiple line breaks impacting separate splicing enclosures that reduced capacity between 2 Azure regional data centers."

    "multiple line breaks" and "reduced capacity between 2 azure regional data centers". So as far as I can tell, it's not "we lost all connectivity due to a single line break".

    Well, I think they've still missed the idea of a diverse feed then, it should be diverse from the point of their core routers even including the datacentre cross connects. Unless two seperate digger teams happened to dig up each of the diverse routes at the same time - I'm not buying it.

    Not saying it isn't poss me for two diverse feeds to go offline, coincidences can happen, but their explanation seems off.

    Thanked by 1Clouvider
  • For all of the providers saying "they should have known" or they should have done better blah blah blah. Can you provide documents detailing your fiber route maps internal and external to your building - not just to the edge of the lot- but for miles and miles away from the DC- so we can see you've truly documented that your network is so robustly designed there are no convergent paths (yep that means 2 sea cables- not just one if you are in the EU and servicing the US. My point its easy to point a finger based on one paragraph of info- but is your setup truly better and documented?

    Thanked by 1gestiondbi
  • regarding bandwidth - Inbound is free (I've seen providers here that count it) and pricing does drop for high volume users. Of course well structured web pages and data pulls will minimize costs and sloppy code on high usage site would get severely penalized in bandwidth charges on azure (or aws or google), but would be less impactful on providers with more "free" bandwidth.

  • Remember that time where Azure in the Netherlands was completely shut down because high humidity?

    I'm really starting to question what the money is going to here.

  • YuraYura Member

    @ehhthing said:
    I'm really starting to question what the money is going to here.

    Hookers, cocaine, lawyers.

  • @Clouvider said:
    Two diverse fibres going out via two opposite sides of the building that never overlap is the minimum standard I consider ‘professional’

    At ~$26000 per gigabit/mo, it's simply not feasible.

    Thanked by 2Clouvider J1021
  • ClouviderClouvider Member, Patron Provider

    @OmgpleaseRead said:
    For all of the providers saying "they should have known" or they should have done better blah blah blah. Can you provide documents detailing your fiber route maps internal and external to your building - not just to the edge of the lot- but for miles and miles away from the DC- so we can see you've truly documented that your network is so robustly designed there are no convergent paths (yep that means 2 sea cables- not just one if you are in the EU and servicing the US. My point its easy to point a finger based on one paragraph of info- but is your setup truly better and documented?

    Yes. We require detailed fibre map for our proposed route to be a part of the contract when we order a fibre. We then make sure that cross connects are run diversely to each of the diverse pair. It's quite a detailed planning process when each new location is opened.

  • Awesome! I guess for these things the devil is in the details and it can get quite detailed to go through.

  • InfinityInfinity Member, Host Rep

    @OmgpleaseRead said:
    For all of the providers saying "they should have known" or they should have done better blah blah blah. Can you provide documents detailing your fiber route maps internal and external to your building - not just to the edge of the lot- but for miles and miles away from the DC- so we can see you've truly documented that your network is so robustly designed there are no convergent paths (yep that means 2 sea cables- not just one if you are in the EU and servicing the US. My point its easy to point a finger based on one paragraph of info- but is your setup truly better and documented?

    Yes, all of our dark fibre from our various providers comes with detailed maps of the splicing points and amplification sites and routes etc. The routes are also carefully planned prior with the provider. Granted we don't have the same maps for transit providers, but that's why transit is taken at several sites and sites are interconnected diversely.

    Also worth noting that on all of our metro dark fibre, at least within the UK we have to pay tax, so at that point it's absurd if your don't request a map of your route.

    Thanked by 1Aidan
  • Azure ( and infact the whole Microsoft) operates via resellers and CSPs. Once you get billing 1500 - 2000 bucks a month with them, they will make you CSPs, reseller, authorized channel partners etc, etc, whatever they call it. Then if there is a customer case /project, it gets transferred to you and you get business through Microsoft.

    I have many dev teams and companies that work on this model. They know Azure is highly overpriced infra, yet the stick to them for business that eventually pays off itself and earns them good money (or not I don't know but thats a model they follow).

    Azure support is mediocre to bad and it is highly likely at the first instance you will hit non-competent third party support staff, only when you yell on them, things take notice. This also depends on regions... US has usually better support, UK... I never felt very good and Asia is worst.

  • joepie91joepie91 Member, Patron Provider

    @mrtz said:
    "This issue was attributed to a fiber cut caused by construction approximately 5 km from Microsoft data centers. This resulted in multiple line breaks impacting separate splicing enclosures that reduced capacity between 2 Azure regional data centers."

    "multiple line breaks" and "reduced capacity between 2 azure regional data centers". So as far as I can tell, it's not "we lost all connectivity due to a single line break".

    Multiple breaks on the same line, judging from the cause (because construction work would not simultaneously break two geographically redundant lines).

    And yes, I am aware they call it 'reduced capacity'. I've spoken to people who reported straight-up outages of their services. Apparently 'reduced capacity' means "some people's services are up, some people's services are not".

    @Chuck said:
    87$/TB? Is this for real?

    Yes.

    @OmgpleaseRead said:
    regarding bandwidth - Inbound is free (I've seen providers here that count it) and pricing does drop for high volume users. Of course well structured web pages and data pulls will minimize costs and sloppy code on high usage site would get severely penalized in bandwidth charges on azure (or aws or google), but would be less impactful on providers with more "free" bandwidth.

    Even if you take the cheapest bulk tier and halve the price to compensate for the 'free inbound', it's still $25/TB; an order of magnitude more expensive than what providers here charge. I expect serious redundancy for that kind of cost difference.

    Thanked by 1vimalware
  • gestiondbigestiondbi Member, Patron Provider

    @Infinity said:

    @OmgpleaseRead said:
    For all of the providers saying "they should have known" or they should have done better blah blah blah. Can you provide documents detailing your fiber route maps internal and external to your building - not just to the edge of the lot- but for miles and miles away from the DC- so we can see you've truly documented that your network is so robustly designed there are no convergent paths (yep that means 2 sea cables- not just one if you are in the EU and servicing the US. My point its easy to point a finger based on one paragraph of info- but is your setup truly better and documented?

    Yes, all of our dark fibre from our various providers comes with detailed maps of the splicing points and amplification sites and routes etc. The routes are also carefully planned prior with the provider. Granted we don't have the same maps for transit providers, but that's why transit is taken at several sites and sites are interconnected diversely.

    Also worth noting that on all of our metro dark fibre, at least within the UK we have to pay tax, so at that point it's absurd if your don't request a map of your route.

    @Clouvider said:

    @OmgpleaseRead said:
    For all of the providers saying "they should have known" or they should have done better blah blah blah. Can you provide documents detailing your fiber route maps internal and external to your building - not just to the edge of the lot- but for miles and miles away from the DC- so we can see you've truly documented that your network is so robustly designed there are no convergent paths (yep that means 2 sea cables- not just one if you are in the EU and servicing the US. My point its easy to point a finger based on one paragraph of info- but is your setup truly better and documented?

    Yes. We require detailed fibre map for our proposed route to be a part of the contract when we order a fibre. We then make sure that cross connects are run diversely to each of the diverse pair. It's quite a detailed planning process when each new location is opened.

    I can't tell for EU, but for US and CAN, lots of providers refuse to provide network plans to customer, even to DC like OVH. They say it's for security measure, which is not totally false when in a single splice you can have your competitors but also gov and others critical). I can tell this since I'm myself a network builder which has done fiber installations to some high importance enterprises and none of then have the plan of the network, outside of their own floor and first network pole/manhole (connection point to network). Lot's of them have "ring" setup which result in not a ring for many kilometers due to network limitation, fiber/splice re-use, reverse setup and even, simply because the engeenier didn't see on their plan a common point of failure (ex. Re-sale/lease fiber, outdoor element, human interaction, etc.).

    Simply hope they will learn, and then contact their providers and check why the ring didn't kick in and change what need to be changed.

    Regards, David

  • InfinityInfinity Member, Host Rep

    @davidgestiondbi said:
    I can't tell for EU, but for US and CAN, lots of providers refuse to provide network plans to customer, even to DC like OVH. They say it's for security measure, which is not totally false when in a single splice you can have your competitors but also gov and others critical). I can tell this since I'm myself a network builder which has done fiber installations to some high importance enterprises and none of then have the plan of the network, outside of their own floor and first network pole/manhole (connection point to network). Lot's of them have "ring" setup which result in not a ring for many kilometers due to network limitation, fiber/splice re-use, reverse setup and even, simply because the engeenier didn't see on their plan a common point of failure (ex. Re-sale/lease fiber, outdoor element, human interaction, etc.).

    Simply hope they will learn, and then contact their providers and check why the ring didn't kick in and change what need to be changed.

    Regards, David

    I've only ever dealt with dark fibre (and waves) in the EU, but our providers euNetworks and Zayo are both very detailed in the plans they provide before the circuit is provisioned, they are also consistent in updating if the route has changed (which does happen a fair amount on long haul waves).

    Last mile circuits from national telecoms are a different story, we have a good amount of BT Openreach last mile fibre circuits, and only on rare occasions have we managed to get plans out of them, however they certainly do plan for diversity and keep it that way. We have several fibre breaks on our last mile circuits to clients, but not once have we had both legs of a diverse circuit go down in over 4 years, and they always clearly state where the "pinch-points" are e.g. building without diverse entries from different manholes.

  • jsgjsg Member, Resident Benchmarker
    edited August 2018

    Pretty much all backbones (~ not last mile) are done as rings (well, as quite stretched ellipses) and often have bi-directional fiber pairs. So it's usually not even particularly painful or troublesome to have proper dual feeds through different building feed ins.

    As for maps my experience was mixed. Some (incl. btw very large US carriers) do provide at least "stepped maps" where you get quite precise maps for some miles from your DC and less precise maps for anything beyond that. And some just refuse it completely, usually mentioning security concerns.

    And, frankly, it's not even an absolute must have, because in the end it's the contract that's relevant. And in those you virtually always CAN get something like "the 2 fibers, except for [pops, dcs,...] are at least xyz miles apart from each other" with more specifics re. pops, dcs, landings, etc. (which themselves often are not exactly geo specified). Typically those specs/info are in the contract annex. Plus, of course, you have your SLA. Putting those next to each other you have a pretty good basis to judge both the quality and risks of your feed(s).

    My personal take is that Microsoft purchased dark fiber or waves and simply didn't care too much about the details. Typical very large player attitude. And then it just so happened that a critical portion of their fibers ended up in the same duct. The "partial failure" part is probably due to the fact that the excavator ripped into the duct and some fibers snapped and some didn't (but quite probably streched and/or bent). The really ugly part is that todays fibers aren't "binary"; it's not like "works 100% or not all" but a bunch of factors/grades (like, to name an important one, attenuation per wavelength).

    Thanked by 1FHR
Sign In or Register to comment.