Did BuyVM just go down?

marcm · June 2013

@qps maybe you should let them know that their web site is down due to PHP not working.

zhuanyi · June 2013

@Zeno said:
nod: lv-storage04 and lv-kvm11 still down for now, but http://buyvmstatus.com/ told me it's up? ( check node:storage-lv-04 )

Probably the internal network is up but the external one is not ready yet?

tommy · June 2013

fiberhub.com No route to host :P

marshallford · June 2013

Still not up, I can access the manage server's area but I cannot boot my machine.

Francisco · June 2013

The only nodes down at this point are a half dozen KVM's. 90%+ of our OVZ's were up within 45 minutes (the outage was 20 minutes or so) and a few nodes required a fsck afterwards.

Storage nodes required a little extra work since MDADM didn't save the RAID50. For those that don't remember, we do 2 HW raid 5's then 0 it together in the OS.

I'm working with FH right now to see what's up with the last of the KVM's.

Here is the 'basic' RFO that Natalie gave me:

Hello Francisco,

Thank you for your patience, we will have a tech work on your equipment shortly.
Our facility experienced a partial utility power failure this afternoon.
Because the failure was only partial and not a complete loss of utility power,
our ATS system failed to automatically switch us over from battery to generator
power. We subsequently were able to locate the issue and transferred power to
our backup generator manually, but unfortunately our UPS system had run out of
capacity before we could complete manual transfer. Our electrical contractors
are on site now and are working to determine when it will be safe for us to
return to utility power. This has also caused us to have some routing issues
which we have our network engineer working to resolve currently.

We appreciate your patience, and are working diligently to restore 100% service.
An RFO will be released as soon as available.

Thank you,

Francisco

jbiloh · June 2013

BuyVM is still down? Jeez I feel bad for them and everyone else impacted.

marshallford · June 2013

@jbiloh said:
BuyVM is still down?

Did you read Fran's post?

jbiloh · June 2013

I hadn't refreshed :-p

Francisco · June 2013

Nodes were up quick as I said (< 45 minutes for the OVZ's). Storage took me a bit longer since I had to reassemble arrays.

Right now the only thing pending is a few KVM nodes.

The "positive" I guess is that 2.6.32 got rolled out on all nodes. We had been planning an OVZ wide upgrade in the coming weeks but guess we won't need that now :P

Enjoy the vswap,

Francisco

jcaleb · June 2013

Boss, it .32 stable already based on your testing?

Paul · June 2013

So June is probably a downtime month. Lots of servers going down from hacks, electrical failures, etc. Whewww...

Zeno · June 2013

@Francisco said:
The "positive" I guess is that 2.6.32 got rolled out on all nodes. We had been planning an OVZ wide upgrade in the coming weeks but guess we won't need that now :P

Enjoy the vswap,

Francisco

WOW! vswap, be quick, I need it!

jon617 · June 2013

@Francisco said:
The only nodes down at this point are a half dozen KVM's

Looks like mine is one of those half a dozen KVMs. Down for 6 hours now.

Zeno · June 2013

@jon617 said:
Looks like mine is one of those half a dozen KVMs. Down for 6 hours now.

me too, I have two KVM on lv-storage04 and lv-kvm11

Woop · June 2013

Has anyones node gone live since the outage? Mine's been down all day

wlanboy · June 2013

vpsboard is offline too. Maybe time to stop CC rants. If I compare the uptimes of Buffalo and Las Vegas Fiberhub does not shine.

mpkossen · June 2013

@wlanboy said:
vpsboard is offline too. Maybe time to stop CC rants. If I compare the uptimes of Buffalo and Las Vegas Fiberhub does not shine.

Fiberhub does indeed seem worse than the SJ location they've had before. Routing issues are one thing, but this definitely isn't the first power outage in firehub...

johnlth93 · June 2013

https://my.frantech.ca/announcements.php?id=155

DomainBop · June 2013

but this definitely isn't the first power outage in firehub...

3rd in 4 months
https://my.frantech.ca/announcements.php?id=140
https://my.frantech.ca/announcements.php?id=141

Magiobiwan · June 2013

One would think they'd set it up so that in the event that the UPS power levels begin draining, the backup generators would be kicked on.

Francisco · June 2013

All but 6 KVM's were down for this extended time.

You're right, FH has been getting on our nerves with the power stuff. We figued Feb was the end of it and have been quite happy. Todays episode was resolved quick for most people but still, It shouldn't have happened.

They've been working slowly on getting B feeds in place like we had been planning since Feb. I'm fairly sure that if the B feeds were in we would have felt a short network blip and get past. Power was the #1 thing we brought up when we were checking them out and they assured us all would be well. Prior to these last few months where they've been growing their setup things were quite solid.

I don't doubt they'll address it all, I just wish it wouldn't keep kicking us in the groin :P

Francisco

letbox · June 2013

@Francisco hope everything ok with you and my best luck.

Conn8ct · June 2013

Such a shame, Seems like someone will have to move out rather soon other's imagine more power trouble..

Francisco · June 2013

@Conn8ct said:
Such a shame, Seems like someone will have to move out rather soon other's imagine more power trouble..

No, we're simply getting A+B feeds in place like we were planning this whole time. It's not an easy task to setup so it has taken a lot of time for them to tie it up. Rob has some other fancy stuff on the way, he said he'll document it all in his RFO due by the weekend I hope.

All is well otherwise. Stallion 2 is pending a single import script and node upgrades are coming in August. 2.6.32 finally rolled out to all of LV (granted not in the way we originally planned). When we go in August we'll install the ATS units then and get all of our power things addressed in one go.

The setup that's going for B side is likely the same size as A, but the amount of clients on the B side is going to be quite low since most people won't want to fork for the cost of A+B feeds.

It's frustrating but it's also hard to find reasonably priced DC's that will resolve issues they have. Portland was a level of fucked I won't bother describing. FMT had lots of power issues and HE never documented to anyone what they changed. Did they fix FMT1? Who knows. We had tons of long network outages because HE had all of China DDoSing them. The DDOS was a monthly thing.

Coresite had us twice in a position of no generators for weeks but thankfully they weren't needed. Top that with the constant problems we had with the network that last till the day we left. 8 months of watching EGI chase their own tail.

FH has a very stable network with the only issue being our own router is finally topped out (we hit almost 2x higher in FH in peak usage as we did in SJ on the same hardware). As I said, I have no doubt Rob & Don will plan out a really solid setup. Top that with staff that actually care about what's offered. Every time we have anything we need done (reboots, builds, network tweaks, etc) they handle quickly. I think they just have had a large surge of sales (I don't think they have anymore cages available) and have had to scale what they have.

With the feature set Stallion 2 will bring A+B feed is going to be a requirement. A power outage with load balancers, anycast, etc, all involved could turn into the most brutal mess to cleanup.

Francisco

Zeno · June 2013

@Francisco Node:lv-kvm11 still down for now? I can't get my VPS online

Zeno · June 2013

@Jack I do it several times:

27/06/2013 03:06 AM Boot x.x.x.x Complete

27/06/2013 03:06 AM Hard Power Off x.x.x.x Complete

27/06/2013 12:10 AM Boot x.x.x.x Complete

27/06/2013 12:10 AM Hard Power Off x.x.x.x Complete

26/06/2013 11:16 PM Boot x.x.x.x Complete

26/06/2013 11:16 PM Hard Power Off x.x.x.x Complete

26/06/2013 06:13 PM Boot x.x.x.x Complete

Francisco · June 2013

Log a ticket and we'll check it out

Francisco

Daniel15 · June 2013

The "positive" I guess is that 2.6.32 got rolled out on all nodes.

My VPS is showing "3.2.0-042stab076.8"... Is that actually 2.6.32 and not 3.2.0?

jon617 · June 2013

My KVM VPS is back online.

pcan · June 2013

@Magiobiwan said: One would think they'd set it up so that in the event that the UPS power levels begin draining, the backup generators would be kicked on.

A high power circuit does not work this way. You can't blindly transfer lots of power from one system to another: if something goes wrong, bad things will happen. ATS systems have safety interlocks to enable transfer, and they can have issues: if something does not match, the sequence is aborted and manual intervention is needed. A good ATS is anything but trivial, both during design and at field installation, and therefore is expensive and need to be constantly mantained. Any "shortcut" to save some money will lead to missing response on some fault paths (such as not recognizing a single phase fault on a three phase system, or having a fuse "in the wrong place"). When I worked for a generator control systems manufacturer, the main failures complained by end customers where due to maintenance neglect: the Diesel motor starting too late or not starting at all because it was not exercised according to manufacturer specification, and power switch failures due to overload (you can overload the ATS transfer and the load will keep going, but the electrical contacts will "fuse" togheter and will not operate when needed). This is not the case on generators rated for life support, because there is a infrastructure in place to force checks and audits; but on commercial applications usually the cheapest route is followed, often behind the end-user back.

Howdy, Stranger!

Categories

In this Discussion

Did BuyVM just go down?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Did BuyVM just go down?

Comments