Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Prometeus ? - Page 11
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Prometeus ?

189111314

Comments

  • mi5h0mi5h0 Member
    edited March 2015

    @bsdguy said:
    Oopsie. Stupid me. I assumed that the international phone system, emails servers, twitter, Skype, and other means of modern communication were still available ;)

    That is exactly the point!
    There was no notice to customers on email, twitter, internal forum, website, or any other means...

    I came to this forum, registered and posted inquiry. Only then I got the first information from @Maounique...

  • netomxnetomx Moderator, Veteran

    Maounique said: 144 were cut through a passage

    whoa :/

  • MaouniqueMaounique Host Rep, Veteran
    edited March 2015

    Yes, I am the PR person, this does not mean no work is done, only that I cannot be available all the time.
    I was in a train tired with a lot of luggage from a day at the ski.
    Phone was in the backpack and i didnt hear it. What can I say, s ** tty day is s ** tty...
    Really sorry for this.

    Thanked by 1netomx
  • @mi5h0

    I'm with you, man. Seems to be a wide spread disease with providers. That's by no means specific to Prometeus.

    I guess it's about time WHMCS created a "panic info mail to all clients on node(s)" module **g

  • Maybe they are afraid they will lose potential customers if they advertise every problem though social media etc. A status page hosted somewhere else should be standard though.

    By the way, 5 hours since I got the first Pingdom alert. Lets see...

  • mi5h0mi5h0 Member

    @Maounique I understand, I'm not looking to blame.
    I just assumed that there was someone on duty 24/7 that could set up some type of notification to customers that would help us save time and nerves...

  • @Maounique said:
    Update:

    The fibers are rechecked. 144 were cut through a passage, many other people in the area are affected. It is possible that the repair was not done correctly creating intermittent signal quality, even the link says up.

    Those types of problem are always "fun" to deal with and often it's very hard to give an accurate ETA particularly with the larger cables it will depend when the engineers get to your CCT and where it is in the cable.

  • mi5h0mi5h0 Member

    @bsdguy Yes, that might be useful except in case when WHMCS database is out of service :)

  • MaouniqueMaounique Host Rep, Veteran
    edited March 2015

    We thought it was the switches, but that does not seem to be the case upon in situ inspection. Both failing same time is completely unlikely anyway.

    Let's be clear, the engineers already repaired all those 144 fibers and almost everyone else is up, but a few were seemingly botched probably due to the pressure, and now they are re-checking one by one.

  • @Maounique said:
    We thought it was the switches, but that does not seem to be the case upon in situ inspection. Both failing same time is completely unlikely anyway.

    Let's be clear, the engineers already repaired all those 144 fibers and almost everyone else is up, but a few were seemingly botched probably due to the pressure, and now they are re-checking one by one.

    They should be able to perform an End2End test and confirm to you that the fibre is good or not, ofc they'll have to disconnect it to do it but as it down already.

    Also check the RX/TX levels at each end.

  • MaouniqueMaounique Host Rep, Veteran

    dragon2611 said: They should be able to perform an End2End test and confirm to you that the fibre is good or not, ofc they'll have to disconnect it to do it but as it down already.

    end2end passed. Signals look acceptable. It was online twice. The link shows up.
    We are working in parallel to piggyback on some other fiber around. It involves some permissions from owners and the setup.
    This is a nightmare and I am really sorry :(

  • If there's a fibre engineer onsite ask them shove an OTDR on it, a bad splice should hopefuly show up.

    That's about the limit of my fibre knowledge I'm afarid, I know what an OTDR is and roughly what it's used for but as to interpreting the results that will need someone with more fibre training than me, was trained to splice indoor fibre (So not the multi-core stuff) several years ago but never actually had to do any.

  • MaouniqueMaounique Host Rep, Veteran
    edited March 2015

    dragon2611 said: If there's a fibre engineer onsite

    I am sure Salvatore does all he can, He is there for hours and knows the facility way better than me since he helped build it.
    This looks like a massive bad luck and might force us to take a separate carrier link to the second DC to have at least some limited connectivity over a tunnel or something as long as it comes via another route, not the same canal.

  • zeitgeistzeitgeist Member
    edited March 2015

    @mikho said:
    While the client area is down I suspect there isn't that many tickets to answer. :)

    Which is a good reminder that it wouldn't be a bad idea to have a redundant solution for client communications.

  • MaouniqueMaounique Host Rep, Veteran

    We have forum and twitter. Unfortunately, twitter didnt work for me and have to wait for uncle to fix it, and I was in a train when this happened.

    Update: whmcs online.

  • @Maounique, that's all good, but I was thinking of a redundant solution for your regular client communication, i.e. the ticketing system. As you mentioned before, "setup redundancy across datacenters and providers, even countries, otherwise you will continue to be disappointed, no matter how much you pay." I think the same goes for a hoster's business site/ticketing system.

    Thanked by 1Maounique
  • MaouniqueMaounique Host Rep, Veteran
    edited March 2015

    The connection is done through another cable for now. The old link is still showing on, but not working.
    All the services are reconfigured to be working through this new route, essentially piggybacking on a working one.

    I think the same goes for a hoster's business site/ticketing system.

    Indeed, but, an external (in a DC we do not control) copy of our database will send the already crazed privacy "specialists" into overdrive.
    we do have 2 frontends, but only one database.

  • Maounique said: The connection is done through another cable for now

    you mean all the stuff should work now? still can't ping my DC2 servers

  • Do you know if the commands we sent to the VMs through Cloudstack will be executed after the link is back online? (reset, stop etc)

  • MaouniqueMaounique Host Rep, Veteran

    There is a timeout depending on command of various lenghts.
    It will also affect snapshots.
    The rerouting is in progress. The nodes should come back shortly.

  • @afonic said:
    Do you know if the commands we sent to the VMs through Cloudstack will be executed after the link is back online? (reset, stop etc)

    if you got errors (like me) i would say no.
    anyway commands seems to work now

  • @Maounique said:
    ... will send the already crazed privacy "specialists" into overdrive

    "Crazed" as in "I use a fake name in my support job and I use fake names as client, too"?

    Btw: Like zeitgeist I also thought of your redundancy advice but I didn't mention it here because it would have felt like kicking a man who is already on the floor anyway (meaning I respected that you were under stress with the current DC/fiber problem and didn't want to laugh at you).

    Thanked by 2vimalware alexvolk
  • MaouniqueMaounique Host Rep, Veteran
    edited March 2015

    I know and would have not taken it that way. We did think of it, but we only control one datacenter. Replicating or giving access through servers we do not fully control, would have introduced another way of attack on the private data. Whether you believe it or not, we are doing our best to keep it private.

    Due to rerouting, other services will go down briefly.

    Thanked by 1zeitgeist
  • Now my OVZ server is down too :(

  • CrabCrab Member

    My KVM server in IWStack just went down.

  • @Maounique:
    Hey there,
    my ovz on pm17 is showing a bit of weirdness. is it related?

    --- google.com ping statistics ---
    327 packets transmitted, 70 received, +21 errors, 78% packet loss, time 566767ms

  • InfinityInfinity Member, Host Rep
    edited March 2015

    @gsrdgrdghd said:
    Now my OVZ server is down too :(

    @Crab said:
    My KVM server in IWStack just went down.

    @fixidixi said:
    Maounique:
    Hey there,
    my ovz on pm17 is showing a bit of weirdness. is it related?

    --- google.com ping statistics ---
    327 packets transmitted, 70 received, +21 errors, 78% packet loss, time 566767ms

    I'm sure M will chime in with a more complete story, but rerouting and adding another temporary cable needed a short disconnect. Should be working fine on other services again now.

  • MaouniqueMaounique Host Rep, Veteran
    edited March 2015

    Maounique said: Due to rerouting, other services will go down briefly.

    We are reorganizing the network to be more resilient and have a way to switch to another cable if needed to a backup. This required a big switch repurposed.

  • elbandidoelbandido Member
    edited March 2015

    So dc2 will be up shortly?

  • Thanks for the updates!

Sign In or Register to comment.