Prometeus ?

elbandido · March 2015

@Infinity said:
DC2 Milano is affected, there seems to be some broken links between both DC's in Milano, affecting DC2. It's possible that it's a fibre cut. Uncle and the engineers are on it of course.

thx.. but anyway...an official word from @Maounique is always welcomed

gsrdgrdghd · March 2015

IIRC @Infinity also works (worked?) for Prometeus

alepore · March 2015

aaaand we're back online
EDIT: not really...

mi5h0 · March 2015

iwstack instance at Milano/DC2 and also client area (database error) down for me too.
Any info what's going on ?

elbandido · March 2015

DC2 was working for 10/15minutes than down again...and right now also down..

mi5h0 · March 2015

Worst of all is the lack of information. If I knew downtime would take time I could start with transfer to an alternative hosting ..

alepore · March 2015

imho they really underestimate how important customer support is.

elbandido · March 2015

strange, usually @Maounique is always on the comunication side, and i was aspecting her brilliant customer support to help us undestanf what is going on...

alepore · March 2015

this is an example on how a small company can handle a catastrophic outage support:
http://dnsimplestatus.com/incidents/v0x4h75gxf7x

mi5h0 · March 2015

@alepore said:
this is an example on how a small company can handle a catastrophic outage support:
http://dnsimplestatus.com/incidents/v0x4h75gxf7x

Yes, that's perfect.

I watch twitter, official forum, website news, e-mail ... but there is nothing. I do not understand what is the problem to publish at least basic information: "Yes, we are aware of the problem. We're working on it ..."

dragon2611 · March 2015

Given how @Maounique was at giving information regarding the last major outage I'm willing to give Prometeus the benefit of the doubt in this instance, maybe there's something that's preventing him/her posting an update.

netomx · March 2015

Or he's working on it!

Maounique · March 2015

I am in a train now soon in bus. Fiber cut repaired then switches failed. Uncle is there to figure out what Murphy is up this time.

mi5h0 · March 2015

@Maounique Thanks for the info.
Do you have any ETA? I have to think about directing domains to an alternative server if there is no timeframe. My clients will 'kill' me if they do not have e-mails by tomorrow morning ...

elbandido · March 2015

me too @Maounique..client allready killed me 2h a go...reminding me that 2 weeks a go i asked them to swith from Milano to DC2 and also there was downtime to move them..
thank you very much

AnthonySmith · March 2015

Maounique said: The myth that HDDs are more reliable than SSDs has been disproved a while ago. It matches our experience too, for example we only had a couple of SSDs fail so far IIRC, perhaps 3, but much more than 10 mechanical drives, even SAS short stroke ones, top of the range fail more often than SSDs.

Agree with this but only when using decent SSD's Intel's etc but then you don't get the same performance, the gimik super fast SSD's die faster then fruit flys when used 24x7 in my experience.

Then you have just pure luck, I still have a desktop running 2 x 80GB IDE drives in software raid 0 upstairs that has been running for about 10 years, both drives are fine

Bottom line is don't pick any host based on drive longevity, they all fail at some point and completely at random.

Maounique · March 2015

The facility is new and hardware also. It is not known what happened and especially why the switches failed including backups shortly after repairing fiber when I was trying to write a mail in hard conditions in train. Our database is in the new facility so it is unreachable. The instances should be up only connectivity failed. Since we do not know why the switches failed we can only give an estimate of about 2 hours it would take to replace them. If a repair is possible should be up sooner. This is puzzling everyone and only people in the field can know more and they are trying hard to find issue and solve it fast. Sorry for clumsy typing will be home in half an hour hopefully to type a RFO.

bsdguy · March 2015

My recent trouble with Maounique aside, I'm a little concerned that Prometeus got struck the second time in just a week or two.

Let's hope for Prometeus that it's the last problem for a long time. I wish them good luck.

Maounique · March 2015

bsdguy said: Prometeus got struck the second time in just a week or two.

Only 2 days ago Uncle managed to finally nail the shutting down instances on pm63 and was enjoying a day with the family while i was finally on the slopes enjoying a day at ski.
This struck unexpectedly, we are probably way behind on the quota on Murphy human sacrifices.

Fortunately, there is unlikely any hardware damage as it was confirmed (during the short time connectivity was back) that all the instances and servers are up and running so there are no expected follow up issues unlike on the 17th when the power was cut to both facilities.

It is a deep mystery why the switches failed because it is hardly a coincidence, shortly after the fibercut was repaired, this kind of problem should not affect them in any way.

bsdguy · March 2015

@Maounique

I had second thoughts, too. After all Prometeus, from everything I know, is well managed and operated. It's just not the kind of provider where outages are likely. Unless, that is, someone outside has dirty hands ... (or gross incompetence, e.g. at the DC?)

Whatever it may be, I hope that "uncle" can soon be back with his family and that Prometeus is soon humming along again.

May the byte Gods be with you!

afonic · March 2015

Hi,

I registered here to post this, been reading the site for a while. Been using Cloudstack, Xen for about 3 months now.

I have tried Twitter, nothing. Ticket system is down. I've emailed both their addresses, nothing. In their board, again nothing. I called at the number at their website, and I got a fax machine.

It's over 4 hours now and not a SINGLE reply. Thank god I thought to check here or I'd go crazy.

Downtime can happen, but not having people standby at a distance less than an hour and having nobody at the support staff in the middle of a working day, is amateurish. When I signed up, after testing iwStack for a while, I emailed them asking their honest opinion about if I should host critical stuff. Now I have clients calling and I can't even give them an ETA, because NOBODY HAS REPLIED, YET.

So my opinion for iwStack? Great for a hobbyist to check out Cloudstack, probably host a few non-critical servers. For business, look elsewhere.

With the ~80 euros per month I could get better support elsewhere, and probably I will.

bsdguy · March 2015

@afonic

Come on, give them another hour or two, hmm.

I guess they react slowly for two reasons, a) a question/support requests flood (they have 1000s of clients) and b) they concentrate on getting it to work again rather than answering emails. Which is a sensible thing to do because that's the one thing that gets everyone happy again.

But you are right in that it would be strongly desirable to have better communication in such cases. ** sigh

alepore · March 2015

http://board.prometeus.net/viewtopic.php?f=15&t=1425

mikho · March 2015

@bsdguy said:

I guess they react slowly for two reasons, a) a question/support requests flood (they have 1000s of clients)

While the client area is down I suspect there isn't that many tickets to answer.

Maounique · March 2015

afonic said: Downtime can happen, but not having people standby at a distance less than an hour

The fibercut was repaired within an hour, but the staff there was unable to determine why the switches are not passing any packets along in spite of being on and the links established and after they briefly worked for a while.
Our highest level technician is in place with spares to replace them if repair fails.

afonic said: For business, look elsewhere.

Or rather setup redundancy across datacenters and providers, even countries, otherwise you will continue to be disappointed, no matter how much you pay.

elbandido · March 2015

nice move @alepore! thx

afonic · March 2015

As I said in my earlier comment, my biggest issue is lack of communication, not the downtime.

Even though this is the second time I have downtime in under 30 days, that I can understand. Machines break and humans make mistakes after all. Failing to provide a single line of information for 3 hours, 50 minutes is unacceptable.

bsdguy · March 2015

@mikho said:
While the client area is down I suspect there isn't that many tickets to answer.

Oopsie. Stupid me. I assumed that the international phone system, emails servers, twitter, Skype, and other means of modern communication were still available

Maounique · March 2015

So, recap so far as far as I can tell without being in the field, only from brief phone chat:
1. fibercut of an unknown yet cause (TBD).
2. repaired within an hour or so, briefly online, then down, then back online, then down.
3. the recurring problem was generated by the switches for an unknown cause.
4. Repair or replacement should not take more than a couple of hours.

Maounique · March 2015

Update:

The fibers are rechecked. 144 were cut through a passage, many other people in the area are affected. It is possible that the repair was not done correctly creating intermittent signal quality, even the link says up.

Howdy, Stranger!

Categories

In this Discussion

Prometeus ?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Prometeus ?

Comments