Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


lu-shared02 down, who is the owner? - Page 3
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

lu-shared02 down, who is the owner?

13»

Comments

  • No status page?

  • Up now :smiley:

  • Hello Edoardo,

    Over the last 24 hours we've had 2 major failures in our Luxembourg
    location. These issues have been resolved, but we wanted to give a full
    disclosure of what happened and whats been done to resolve them.

    August 4th:

    Earlier last week we replaced our old Linux based router with a new Juniper
    MX204. The cut over went near painlessly with only a few issues with IPV6
    connectivity. During the morning (GMT -7) of August 4th this unit stopped
    forwarding TCP traffic, but was passing ICMP traffic just fine.

    Due to some foul ups on my part (phone muted from earlier in the day) I
    wasn't awoken by my alerts, nor by Anthony/Karen trying to call me.

    Once I awoke the services were restored. We weren't able to identify the
    exact cause of this issue, but we've made some changes to our routers
    configuration in hopes it doesn't happen again.

    August 5th:

    We were alerted of multiple nodes being down in Luxembourg. Initially we
    thought it was possibly a large DDOS attack leaking, or a return of the
    August 4th issue. We quickly realized it was an entire rack being down. We
    were able to get in contact with our support staff at ROOT.LU who helped
    identify the reason for the fault.

    The outage was caused by a PSU failing in one of our slice nodes. Instead of
    failing "gracefully", it ended up dead shorting causing the breaker to pop
    and all nodes on this rack to go offline. Thankfully we only had to replace
    the PSU in KVM07 and have been able to bring all services, including KVM07,
    back online.

    ====

    We apologize about these issues and are making changes to address them where
    possible.

    For the network issue, we'll be working to make changes to how our
    monitoring & alerting operates.

    For the power issue, there isn't much we can do as the chassis we use have a
    single PSU. An ATS wouldn't have addressed this as the PDU would've been
    connected to that.

    I thank you for your patronage and patience.

    Francisco

    Sleep tight fran

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    edfox said: Sleep tight fran

    Haha :)

    Its been OK. A few quirks to iron out related to ISO's but overall went fine. KVM7 had to get a new PSU out of it.

    Francisco

  • @Francisco said:
    Haha :)

    Its been OK. A few quirks to iron out related to ISO's but overall went fine. KVM7 had to get a new PSU out of it.

    <3 Keep up the great work

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    edfox said: Keep up the great work

    I'll try.

    For the alerting stuff I snagged an additional phone line and will just have a old phone of mine with a whitelist of numbers (and pushover) sitting on a book shelf. That'll keep the spam calls away and should hopefully bug me if something catches fire.

    Francisco

  • First-RootFirst-Root Member, Host Rep

    Last month we had 2 failing psu taking down one of the two power feeds as well, both from supermicro.

    Thanked by 1uptime
  • @Francisco said:
    For the alerting stuff I snagged an additional phone line and will just have a old phone of mine with a whitelist of numbers (and pushover) sitting on a book shelf. That'll keep the spam calls away and should hopefully bug me if something catches fire.

    Francisco

    Fran's new Primary Alert System is now fully functional.

  • 1-800-STALLION

  • techturchtechturch Member
    edited September 2019

    Luxembourg network problem again? I cannot reach any Luxembourg servers.
    Edit: Up now

  • @techturch said:
    Luxembourg network problem again? I cannot reach any Luxembourg servers.
    Edit: Up now

    My monitor showed downtime for 22 mins , seems up again.

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    Cleared up.

    Someone was beating up Cogent pretty bad so it caused some packet loss.

    Francisco

    Thanked by 1kkrajk
Sign In or Register to comment.