Prometeus ?

iwaswrongonce · February 2015

@bsdguy said:
No, Sir, something like that just does not happen in a well designed and run DC. Period.

Yeah, never ever ever...

Amazon EC2, the compute component of AWS, saw about 10 outages across multiple regions in 2014, lasting from 19 seconds to about 9 minutes, according to CloudHarmony. Google Compute Engine had about 60 that lasted from 10 seconds to 37 minutes. Azure Virtual Machines saw more than 100 outages, lasting from 10 seconds to about 12 hours on one occasion.

Azure’s big multi-region outages happened in August, when the provider reported full service interruptions across multiple regions, and in November, when about 20 services were interrupted in most of its availability zones. The 12-hour single-zone outage was on November 5 in the asia-east zone, which went down multiple times that day and once on the preceding day.

cosmicgate · February 2015

Manually booted my vps and it came up!

Maounique · February 2015

bsdguy said: Good luck, Prometeus!

Thank, you, but let's agree to disagree about the quality of the DC. You probably did not have much direct experience with many, while, for me, everything i dealt with was far worse than Caldera Business Park in Milano.
As of now, more services are coming up.
IWStack coming back is an unknown situation for now, we never had this issue before and must make sure all nodes are up before starting the master. I do not expect, however, higher load, unless at least a couple of the nodes fail to start and we decide to start the master even like this and this will mean HA kicking in and restarting instances on other nodes. If all nodes are up, maybe only the interface will be a bit overloaded as many people login to check status and see if the snapshots are in order.

bsdguy · February 2015

Classical mistake to confuse "good" and "big".

Moreover, I didn't say no problems can happen (in a good DC). What I said was that something like this, a complete outage/shut down for more than seconds or worst case some minutes, doesn't happen (in a good DC).

Maounique thanking people who say otherwise and arguing against what I say tells me something, too. But - an important but: Them being in deep trouble right now I will certainly not go deeper into that right now.

In this situation I prefer to wish them good luck in their struggle to bring back online everything.

alepore · February 2015

to be fair, just noticed a good status summary on prometeus client area

rds100 · February 2015

@bsdguy let's wait and see the RFO from the data center. Then we can comment. For now we can only guess what and why happened.

gbshouse · February 2015

@Maounique - sent you PM

kontam · February 2015

@rds100 said:
jonchun it is HA, which doesn't help if the electricity in the whole datacenter is out. Ha guards against single server failure, not against datacenter-wide power outages.

I thought diesel generators are standard these days, but I guess not. We have one in both of our datacentres because UPS can only last for a short while and is only meant to save you from glitches and not work as mains replacement. Genny can go on for days.

Amitz · February 2015

All my stuff is back online. 2 hours of downtime in 3 whole years. That is absolutely nothing! May everything work out fine in the end, I keep my fingers crossed for you guys!

William · February 2015

The gensets DID work, but the UPS apparently did not.

K4Y5 · February 2015

The OverZold servers are still down. Would really appreciate an update on the ETA.

Maounique · February 2015

alepore said: which is probably the only real source of info right now

Not really, emails are going out to customers, but there are 6k+... You might be right, we will need a status page outside of our own premises, just in case.

alepore said: and thank god my MilanoDC2 services was restored in about an hour

Those we recommend for critical stuff, even better, a redundant setup across the world and/or providers.

K4Y5 said: The OverZold servers are still down. Would really appreciate an update on the ETA.

Only those which use the SAN as storage, the SAN will give priority to IWStack KVM zone, then we will restore the budget services.

squibs · February 2015

My iwstack servers are still down and I can't even access the iwstack control panel. Are they posting status updates anywhere else? Nothing on prometeus forum.

Maounique · February 2015

@gbshouse said:
Maounique - sent you PM

Unfortunately, I cannot do anything about it, Salvatore in the premises will have the best judgement on what to do first, probably based on the degree of damage that the server suffered.

K4Y5 · February 2015

@Maounique said:

Only those which use the SAN as storage, the SAN will give priority to IWStack KVM zone, then we will restore the budget services.

Understood. Thanks for that piece of information.

I have sent you a PM with one last question. Would really appreciate if you could answer that, as this outage really has me stressed out.

Thanks again, @Maounique

spazzo · February 2015

Well all our VPS are still down, its gone on for much longer than expected.

techsys · February 2015

No restoration after 8 hours iwstack client access still down no response from raised Ticket - I expected that. I guess you get what you pay for, I don't see it that way myself as I still pay and it is not free, plus what is the point of a ticket system you have no intention to answer or only answer when it suits.

alepore · February 2015

Maounique said: Those we recommend for critical stuff, even better, a redundant setup across the world and/or providers.

yeah, it's time i learn something about database replication...

TinyTunnel_Tom · February 2015

@techsys said:
No restoration after 8 hours iwstack client access still down no response from raised Ticket - I expected that. I guess you get what you pay for, I don't see it that way myself as I still pay and it is not free, plus what is the point of a ticket system you have no intention to answer or only answer when it suits.

Does he guarantee response to tickets. Nope. You get what you pay for.

gsrdgrdghd · February 2015

techsys said: plus what is the point of a ticket system you have no intention to answer or only answer when it suits.

Would you prefer that they fixed the issue or that they spend their time copying the "we are working on it" reply in the answer field of your ticket?

the_angry_cunt · February 2015

Glad to see that iwstack are being congratulated for their fantastic customer relations. (We got an email about 4 hours after the crash. Awesome. Hi fives dude!!) It's now been offline for over 7 hours. I thought this was professional outfit that would have failover and all the other dweeb shit we hear about. My bad. Now perhaps we could have a F*****G UPDATE?

TinyTunnel_Tom · February 2015

@the_angry_cunt said:
Glad to see that iwstack are being congratulated for their fantastic customer relations. (We got an email about 4 hours after the crash. Awesome. Hi fives dude!!) It's now been offline for over 7 hours. I thought this was professional outfit that would have failover and all the other dweeb shit we hear about. My bad. Now perhaps we could have a F*****G UPDATE?

COULD YOU STOP BEING A ANGRY C*NT and be patient.

gpapadopg · February 2015

Just saw this announcement: https://www.prometeus.net/billing/announcements.php?id=14

Maounique · February 2015

spazzo said: Well all our VPS are still down, its gone on for much longer than expected.

IWStack KVM nodes are mostly up now. Also Biz Xen which depends on SAN storage, so I guess the SAN finished checks.
Based on those signs, I expect IWStack to come back in an hour if everything works OK. Restarting the master might still run into trouble, hopefully Salvatore will manage with his usual skill. I can only wish him luck for now.

techsys · February 2015

@gsrdgrdghd said:
Would you prefer that they fixed the issue or that they spend their time copying the "we are working on it" reply in the answer field of your ticket?

Fair point and I have just looked and according to the site prometeus.com and there are still no server or network issues can't seem to add an image so you will need to take my word for it sorry.

vimalware · February 2015

Multi master + some kind of failover should be part of any revenue-critical design today .

But now with milano incident, I just realize even one's 'orchestrator' node (ideally placed in a high availability cloud) also should have a hot spare in different campus .

Just keep calm and order iwstack.

You know that Uncle won't close up shop because of any panicky customers leaving in the short term.

Also this isn't my first good+knowledgeable vps vendor whose colo's generators(edit: or ups in this case) were the villain.

Little OT(philosophy for sysadmins) : https://sivers.org/book/StoicJoy

MaxMiller · February 2015

My Biz Xen is still down.

rm_ · February 2015

techsys said: can't seem to add an image

I can:

0 Open, 0 Scheduled, 2 Resolved issues -- and those are from back in May-2014.

This part of WHMCS is soooo unloved and underused by providers, sometimes I wonder why it's even there (and one would think it's the first place to check in case of any issues)...

techsys · February 2015

Something positive I think - from Prometeus (Operator John Deer)

Hello!

I am sorry for the lack of replies, it is probably a case where I thought salvatore is handling your ticket and him thinking of me.
The power failed at about 7:30 this morning in Italy, meaning 6:30 in UK.
It lasted 80 minutes or so, at least 75.
After that, some 25% of the services came back online, we quickly restored some 25% more, but the rest would not start without on-site maintenance so we have called all our tech guys on site to help. Since then, some 25% more services were restored, but one SAN is still recovering the data and the rest of the services down, more or less depend on it.
this is the first such issue so we do not know how long will take to recover, but there are signs it may not take much longer, perhaps a couple of hours.
We have an announcement about this and sent emails since a few hours. However, 6000+ emails take time to be delivered, we are sorry if you did not receive yours for this or another reason.

Thank you!

John

PortCTL · February 2015

@Maounique said:

Well, I was reading this thread, and it has been handled very well (the downtime), regardless of the people who are upset, it isn't your business's fault for the outage. Once everything is restored, I'll be making a purchase for sure.

Howdy, Stranger!

Categories

In this Discussion

Prometeus ?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Prometeus ?

Comments