New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
@qps maybe you should let them know that their web site is down due to PHP not working.
Probably the internal network is up but the external one is not ready yet?
fiberhub.com No route to host :P
Still not up, I can access the manage server's area but I cannot boot my machine.
The only nodes down at this point are a half dozen KVM's. 90%+ of our OVZ's were up within 45 minutes (the outage was 20 minutes or so) and a few nodes required a fsck afterwards.
Storage nodes required a little extra work since MDADM didn't save the RAID50. For those that don't remember, we do 2 HW raid 5's then 0 it together in the OS.
I'm working with FH right now to see what's up with the last of the KVM's.
Here is the 'basic' RFO that Natalie gave me:
Thank you for your patience, we will have a tech work on your equipment shortly.
Our facility experienced a partial utility power failure this afternoon.
Because the failure was only partial and not a complete loss of utility power,
our ATS system failed to automatically switch us over from battery to generator
power. We subsequently were able to locate the issue and transferred power to
our backup generator manually, but unfortunately our UPS system had run out of
capacity before we could complete manual transfer. Our electrical contractors
are on site now and are working to determine when it will be safe for us to
return to utility power. This has also caused us to have some routing issues
which we have our network engineer working to resolve currently.
We appreciate your patience, and are working diligently to restore 100% service.
An RFO will be released as soon as available.
Thank you,
Francisco
BuyVM is still down? Jeez I feel bad for them and everyone else impacted.
Did you read Fran's post?
I hadn't refreshed :-p
Nodes were up quick as I said (< 45 minutes for the OVZ's). Storage took me a bit longer since I had to reassemble arrays.
Right now the only thing pending is a few KVM nodes.
The "positive" I guess is that 2.6.32 got rolled out on all nodes. We had been planning an OVZ wide upgrade in the coming weeks but guess we won't need that now :P
Enjoy the vswap,
Francisco
Boss, it .32 stable already based on your testing?
So June is probably a downtime month. Lots of servers going down from hacks, electrical failures, etc. Whewww...
WOW! vswap, be quick, I need it!
Looks like mine is one of those half a dozen KVMs. Down for 6 hours now.
me too, I have two KVM on lv-storage04 and lv-kvm11
Has anyones node gone live since the outage? Mine's been down all day
vpsboard is offline too. Maybe time to stop CC rants. If I compare the uptimes of Buffalo and Las Vegas Fiberhub does not shine.
Fiberhub does indeed seem worse than the SJ location they've had before. Routing issues are one thing, but this definitely isn't the first power outage in firehub...
https://my.frantech.ca/announcements.php?id=155
3rd in 4 months
https://my.frantech.ca/announcements.php?id=140
https://my.frantech.ca/announcements.php?id=141
One would think they'd set it up so that in the event that the UPS power levels begin draining, the backup generators would be kicked on.
All but 6 KVM's were down for this extended time.
You're right, FH has been getting on our nerves with the power stuff. We figued Feb was the end of it and have been quite happy. Todays episode was resolved quick for most people but still, It shouldn't have happened.
They've been working slowly on getting B feeds in place like we had been planning since Feb. I'm fairly sure that if the B feeds were in we would have felt a short network blip and get past. Power was the #1 thing we brought up when we were checking them out and they assured us all would be well. Prior to these last few months where they've been growing their setup things were quite solid.
I don't doubt they'll address it all, I just wish it wouldn't keep kicking us in the groin :P
Francisco
@Francisco hope everything ok with you and my best luck.
Such a shame, Seems like someone will have to move out rather soon other's imagine more power trouble..
No, we're simply getting A+B feeds in place like we were planning this whole time. It's not an easy task to setup so it has taken a lot of time for them to tie it up. Rob has some other fancy stuff on the way, he said he'll document it all in his RFO due by the weekend I hope.
All is well otherwise. Stallion 2 is pending a single import script and node upgrades are coming in August. 2.6.32 finally rolled out to all of LV (granted not in the way we originally planned). When we go in August we'll install the ATS units then and get all of our power things addressed in one go.
The setup that's going for B side is likely the same size as A, but the amount of clients on the B side is going to be quite low since most people won't want to fork for the cost of A+B feeds.
It's frustrating but it's also hard to find reasonably priced DC's that will resolve issues they have. Portland was a level of fucked I won't bother describing. FMT had lots of power issues and HE never documented to anyone what they changed. Did they fix FMT1? Who knows. We had tons of long network outages because HE had all of China DDoSing them. The DDOS was a monthly thing.
Coresite had us twice in a position of no generators for weeks but thankfully they weren't needed. Top that with the constant problems we had with the network that last till the day we left. 8 months of watching EGI chase their own tail.
FH has a very stable network with the only issue being our own router is finally topped out (we hit almost 2x higher in FH in peak usage as we did in SJ on the same hardware). As I said, I have no doubt Rob & Don will plan out a really solid setup. Top that with staff that actually care about what's offered. Every time we have anything we need done (reboots, builds, network tweaks, etc) they handle quickly. I think they just have had a large surge of sales (I don't think they have anymore cages available) and have had to scale what they have.
With the feature set Stallion 2 will bring A+B feed is going to be a requirement. A power outage with load balancers, anycast, etc, all involved could turn into the most brutal mess to cleanup.
Francisco
@Francisco Node:lv-kvm11 still down for now? I can't get my VPS online
@Jack I do it several times:
27/06/2013 03:06 AM Boot x.x.x.x Complete
27/06/2013 03:06 AM Hard Power Off x.x.x.x Complete
27/06/2013 12:10 AM Boot x.x.x.x Complete
27/06/2013 12:10 AM Hard Power Off x.x.x.x Complete
26/06/2013 11:16 PM Boot x.x.x.x Complete
26/06/2013 11:16 PM Hard Power Off x.x.x.x Complete
26/06/2013 06:13 PM Boot x.x.x.x Complete
Log a ticket and we'll check it out
Francisco
My VPS is showing "3.2.0-042stab076.8"... Is that actually 2.6.32 and not 3.2.0?
My KVM VPS is back online.
A high power circuit does not work this way. You can't blindly transfer lots of power from one system to another: if something goes wrong, bad things will happen. ATS systems have safety interlocks to enable transfer, and they can have issues: if something does not match, the sequence is aborted and manual intervention is needed. A good ATS is anything but trivial, both during design and at field installation, and therefore is expensive and need to be constantly mantained. Any "shortcut" to save some money will lead to missing response on some fault paths (such as not recognizing a single phase fault on a three phase system, or having a fuse "in the wrong place"). When I worked for a generator control systems manufacturer, the main failures complained by end customers where due to maintenance neglect: the Diesel motor starting too late or not starting at all because it was not exercised according to manufacturer specification, and power switch failures due to overload (you can overload the ATS transfer and the load will keep going, but the electrical contacts will "fuse" togheter and will not operate when needed). This is not the case on generators rated for life support, because there is a infrastructure in place to force checks and audits; but on commercial applications usually the cheapest route is followed, often behind the end-user back.