Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Considerable outage - Clouvider so far, anyone else?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Considerable outage - Clouvider so far, anyone else?

hackermanhackerman Member
edited February 2021 in Outages

According to my alerts and Fing (https://app.fing.com/internet/outage/DROP:GB-England--Clouvider@2021-02-20-1755-00000), considerable outage in London.

Best wishes to the team in resolving this issue as efficiently as possible.

Edit: status pages, as usual, being useless and stating no outage: https://www.clouviderstatus.net/

Comments

  • NeoonNeoon Community Contributor, Veteran

    Looks like 30 minutes and counting.

  • @Neoon said:
    Looks like 30 minutes and counting.

    Thereabouts.
    I'm hoping there'll be an email notification from Clouvider to acknowledge the issue soon.

  • SplitIceSplitIce Member, Host Rep

    Have you opened a ticket?

  • hackermanhackerman Member
    edited February 2021

    @SplitIce said:
    Have you opened a ticket?

    Yes of course :)
    I got a reply within 2 minutes stating they are looking into it - which was nice.

  • NeoonNeoon Community Contributor, Veteran
    edited February 2021

    @hackerman said:

    @Neoon said:
    Looks like 30 minutes and counting.

    Thereabouts.
    I'm hoping there'll be an email notification from Clouvider to acknowledge the issue soon.

    Its undead, it replies to pings but the rest is dead for some reason.

  • All my alerts notifying thing are up again.....

  • yoursunnyyoursunny Member, IPv6 Advocate

    30 minutes of downtime is nothing. You are entitled to cry after at least one hour of downtime.

    The provider will do push-ups to compensate for downtime. The going rate is one push-up per server per hour. No worries.

  • hackermanhackerman Member
    edited February 2021

    @yoursunny said:
    30 minutes of downtime is nothing. You are entitled to cry after at least one hour of downtime.

    The provider will do push-ups to compensate for downtime. The going rate is one push-up per server per hour. No worries.

    @Clouvider has been absolutely brilliant in terms of uptime. I have experienced zero downtime with Clouvider in the 3+ years I've leased products from them.

  • RazzaRazza Member
    edited February 2021

    What data centre the one min ping monitor for my Clouvider in London got zero downtime.

  • @Razza said:
    What data centre the one min ping monitor for my Clouvider in London got zero downtime.

    London.
    Looks like a partial outage. Let's see what the incident report says once it's published.

  • As far as I know they host out of two locations Enfield Virtus and Docklands Telehouse North Two.

    Might be more locations my server is hosted at Telehouse if I remember correctly.

  • AlexBarakovAlexBarakov Patron Provider, Veteran

    @hackerman said:

    @Razza said:
    What data centre the one min ping monitor for my Clouvider in London got zero downtime.

    London.
    Looks like a partial outage. Let's see what the incident report says once it's published.

    Clouvider operates from multiple Data Centers in London. I can confirm that this issue is not spread through the whole network in London as we have no active alarms and we use their network for connectivity in Volta.

    Tagging Dom in here @Clouvider

  • All good here as well so might just be affecting a certain DC, I believe they operate out of 2/3 in London

  • Mr_TomMr_Tom Member, Host Rep
    edited February 2021

    From what I can see only Volta was affected (according to my monitoring). We've got stuff in all of the London DCs (THN2, Enfield) I think and only Volta seemed to go down.

    Thanked by 1hackerman
  • Would love to hear more also.. To me it looks like Telia related outage again.. We had couple of them from monitoring locations in Austria/Germany last week too.

  • Uptime just hard coded in?!

  • @corbpie said:
    Uptime just hard coded in?!

    Keeping it 100.

  • My server with them had no downtime, according to monitoring software and uptime itself

  • "40% of the routes normally processed by the 1 of the 3 Core routers in Volta impacted" - according to latest update.

  • ClouviderClouvider Member, Patron Provider

    We have several routers at Volta. It appears that under a very specific condition a link flap, or a few of them in a very short space of time, have triggered a bug that led to a failure to program PFE correctly by JunOS which in turn caused a blackhole for the MPLS traffic traversing one specific link between one specific router at THN2 and this specific router at Volta.

    In the normal situation a hot stand-by route would kick in under link protection mechanism changing the route to a hot stand-by hop in sub-5ms. Unfortunately this has only happened partially - while the inet.0 IPv4 routes changed, and so protocols continued to communicate leading the router to 'think' it's a business as usual situation, the MPLS route failed to program causing the blackhole. This in turn has unfortunately prolonged the outage as we have initially cleared this link as the ICMP traffic between the routers was moving flawlessly on this link and protocols continued to be established. When we identified the issue we have forced the MPLS routes on the affected device to be re-implemented in the PFE which resolved the issue.

    Impact would have been limited to the Customers usually traversing that specific device and would be limited to the users of these Customers that were connecting through that specific link only, that means that while users from some networks could connect as if nothing happened to these specific services, the others would reach as far as the edge of our network and the trace would have ended there during the incident.

    Link flaps happen from time to time and they are not usually as eventful - thanks to the routing protocols in place and the resiliency of our network. A link flap can happen for a number of reasons, from a bad optic through interference with the fibre to a fibre break. They do not normally lead to a situation as observed today.

    We have dispatched a more detailed version of this summary in the incident response ticket to our Customers.

    Thanks for all the tags and apologies for causing any inconvenience over the weekend!

    Thanked by 2Mr_Tom hackerman
  • yoh that's unlucky :|

  • @Clouvider just wondering if every affected customer got RFO or just one's that reached out ?

  • ClouviderClouvider Member, Patron Provider
    edited February 2021

    The RFO is sent to affected Customers who raised a case that was later merged into the incident response ticket.
    Not every service on that device has been affected, and the degree to which it has been depends on many variables, such as type of service, configuration and including Customer traffic.

Sign In or Register to comment.