New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
That is exactly the point!
There was no notice to customers on email, twitter, internal forum, website, or any other means...
I came to this forum, registered and posted inquiry. Only then I got the first information from @Maounique...
whoa
Yes, I am the PR person, this does not mean no work is done, only that I cannot be available all the time.
I was in a train tired with a lot of luggage from a day at the ski.
Phone was in the backpack and i didnt hear it. What can I say, s ** tty day is s ** tty...
Really sorry for this.
@mi5h0
I'm with you, man. Seems to be a wide spread disease with providers. That's by no means specific to Prometeus.
I guess it's about time WHMCS created a "panic info mail to all clients on node(s)" module **g
Maybe they are afraid they will lose potential customers if they advertise every problem though social media etc. A status page hosted somewhere else should be standard though.
By the way, 5 hours since I got the first Pingdom alert. Lets see...
@Maounique I understand, I'm not looking to blame.
I just assumed that there was someone on duty 24/7 that could set up some type of notification to customers that would help us save time and nerves...
Those types of problem are always "fun" to deal with and often it's very hard to give an accurate ETA particularly with the larger cables it will depend when the engineers get to your CCT and where it is in the cable.
@bsdguy Yes, that might be useful except in case when WHMCS database is out of service
We thought it was the switches, but that does not seem to be the case upon in situ inspection. Both failing same time is completely unlikely anyway.
Let's be clear, the engineers already repaired all those 144 fibers and almost everyone else is up, but a few were seemingly botched probably due to the pressure, and now they are re-checking one by one.
They should be able to perform an End2End test and confirm to you that the fibre is good or not, ofc they'll have to disconnect it to do it but as it down already.
Also check the RX/TX levels at each end.
end2end passed. Signals look acceptable. It was online twice. The link shows up.
We are working in parallel to piggyback on some other fiber around. It involves some permissions from owners and the setup.
This is a nightmare and I am really sorry
If there's a fibre engineer onsite ask them shove an OTDR on it, a bad splice should hopefuly show up.
That's about the limit of my fibre knowledge I'm afarid, I know what an OTDR is and roughly what it's used for but as to interpreting the results that will need someone with more fibre training than me, was trained to splice indoor fibre (So not the multi-core stuff) several years ago but never actually had to do any.
I am sure Salvatore does all he can, He is there for hours and knows the facility way better than me since he helped build it.
This looks like a massive bad luck and might force us to take a separate carrier link to the second DC to have at least some limited connectivity over a tunnel or something as long as it comes via another route, not the same canal.
Which is a good reminder that it wouldn't be a bad idea to have a redundant solution for client communications.
We have forum and twitter. Unfortunately, twitter didnt work for me and have to wait for uncle to fix it, and I was in a train when this happened.
Update: whmcs online.
@Maounique, that's all good, but I was thinking of a redundant solution for your regular client communication, i.e. the ticketing system. As you mentioned before, "setup redundancy across datacenters and providers, even countries, otherwise you will continue to be disappointed, no matter how much you pay." I think the same goes for a hoster's business site/ticketing system.
The connection is done through another cable for now. The old link is still showing on, but not working.
All the services are reconfigured to be working through this new route, essentially piggybacking on a working one.
Indeed, but, an external (in a DC we do not control) copy of our database will send the already crazed privacy "specialists" into overdrive.
we do have 2 frontends, but only one database.
you mean all the stuff should work now? still can't ping my DC2 servers
Do you know if the commands we sent to the VMs through Cloudstack will be executed after the link is back online? (reset, stop etc)
There is a timeout depending on command of various lenghts.
It will also affect snapshots.
The rerouting is in progress. The nodes should come back shortly.
if you got errors (like me) i would say no.
anyway commands seems to work now
"Crazed" as in "I use a fake name in my support job and I use fake names as client, too"?
Btw: Like zeitgeist I also thought of your redundancy advice but I didn't mention it here because it would have felt like kicking a man who is already on the floor anyway (meaning I respected that you were under stress with the current DC/fiber problem and didn't want to laugh at you).
I know and would have not taken it that way. We did think of it, but we only control one datacenter. Replicating or giving access through servers we do not fully control, would have introduced another way of attack on the private data. Whether you believe it or not, we are doing our best to keep it private.
Due to rerouting, other services will go down briefly.
Now my OVZ server is down too
My KVM server in IWStack just went down.
@Maounique:
Hey there,
my ovz on pm17 is showing a bit of weirdness. is it related?
--- google.com ping statistics ---
327 packets transmitted, 70 received, +21 errors, 78% packet loss, time 566767ms
I'm sure M will chime in with a more complete story, but rerouting and adding another temporary cable needed a short disconnect. Should be working fine on other services again now.
We are reorganizing the network to be more resilient and have a way to switch to another cable if needed to a backup. This required a big switch repurposed.
So dc2 will be up shortly?
Thanks for the updates!