BuyShared lu-shared02 down, anyone knows why?

jetchirag · April 2017

Hi,
Buyshared is down from past almost 2 hour. I opened a ticket and didn't got a response. Anyone here knows why? Or any other detail?

alexnjh · April 2017

For Las Vegas I received this email, maybe related?

Hello,

Earlier today we had a failure on our Brocade SuperX, our switching/layer 2 fabric in our Las Vegas facility. Our Vegas based technical staff were able to quickly (given the 8 car pile up on the freeway) get into the facility and address the failed piece of equipment.

We're actively speaking with vendors to replace our entire Brocade deployment in Las Vegas as we've had 3 failures in 2017 alone (2 dead optics and a failing blade in our MLX core router) and 4 - 5 failures in the last 14 months. The MLX failure, while not a complete outage, caused internal routing issues where many users couldn't reach the traffic within the same network or it would be spotty.

We're currently favoring a CIsco 6880-X for our core and Cisco 4948's for switching. We will able to then run redundant fiber runs to each rack instead of having a single central 'mega switch' like we have with the SuperX right now.

We apologize for this all. If you wish to request an SLA please log a ticket with billing and we'll get that sorted for you.

jetchirag · April 2017

@masterqqq said:
For Las Vegas I received this email, maybe related?

Hello,

Earlier today we had a failure on our Brocade SuperX, our switching/layer 2 fabric in our Las Vegas facility. Our Vegas based technical staff were able to quickly (given the 8 car pile up on the freeway) get into the facility and address the failed piece of equipment.

We're actively speaking with vendors to replace our entire Brocade deployment in Las Vegas as we've had 3 failures in 2017 alone (2 dead optics and a failing blade in our MLX core router) and 4 - 5 failures in the last 14 months. The MLX failure, while not a complete outage, caused internal routing issues where many users couldn't reach the traffic within the same network or it would be spotty.

We're currently favoring a CIsco 6880-X for our core and Cisco 4948's for switching. We will able to then run redundant fiber runs to each rack instead of having a single central 'mega switch' like we have with the SuperX right now.

We apologize for this all. If you wish to request an SLA please log a ticket with billing and we'll get that sorted for you.

All my services are in Luxembourg and thanks for quick follow up. Is luxembourg affected too?

budi1413 · April 2017

@Francisco will be using cisco at the end. Suits the name.

lukehebb · April 2017

I've got a reseller on lu-shared02. Been down for the past two hours

My VPS in Lux isn't down though. Opened a ticket around 12 (currently 12.46) with no reply yet

Once its back I'm grabbing the data and moving it elsewhere. Will probably keep my VPS with them for now though

Francisco · April 2017

Hello,

Sorry, we don't have a night time person right now.

LU's fine, it wasn't affected by anything like that.

Could you PM me your IP for your site or the ticket ID?

Francisco

lukehebb · April 2017

@Francisco said:
Hello,

Sorry, we don't have a night time person right now.

LU's fine, it wasn't affected by anything like that.

Could you PM me your IP for your site or the ticket ID?

Francisco

PM'd you

Even can't get on to https://lu-shared02.cpanelplatform.com/ so seems server-wide

cPanel/WHM/Mail are all down, pings still work though

jetchirag · April 2017

Pm'ed and yes, @lukehebb server-wide as main hostname isn't loading as well

isalem · April 2017

seems to be shared 2 only in LUX though ... I'm on LU-Shared01 and everything is running fine here

lukehebb · April 2017

Francisco PM'd me (and responded to ticket) - should be back soon. Seemed to softlock and not trip any alerts that would normally wake him up

apollo15 · April 2017

That SuperX is EOL since 2010 and no support since 2016, issues are expected
http://www.brocade.com/en/support/product-end-of-life.html

jetchirag · April 2017

@lukehebb said:
Francisco PM'd me (and responded to ticket) - should be back soon. Seemed to softlock and not trip any alerts that would normally wake him up

Do they setup physical alerts like this one?

Edit: Sorry bad joke ^

Francisco · April 2017

@jetchirag said:

@lukehebb said:
Francisco PM'd me (and responded to ticket) - should be back soon. Seemed to softlock and not trip any alerts that would normally wake him up

Do they setup physical alerts like this one?

Edit: Sorry bad joke ^

Ahaha.

I'll be 100% honest/transparent here.

I have different alert sounds for buyshared/etc but because of the constant litespeed crashes over the past 8 months I slept right through it. They fixed the crashes just last week with the 5.1.14 release, but it still didn't wake me.

It is/was a soft panic, just waiting on the IPMI to reboot and it should be golden.

Francisco

vimalware · April 2017

This is where the good ole' warchest comes in handy.

Post network pron after everything is resolved.

Francisco · April 2017

@budi1413 said:
@Francisco will be using cisco at the end. Suits the name.

I think the 6880 won't be a viable option in the end. I really liked the unit but it doesn't tick all the boxes I want. The MX240 is the likely go-to option, just waiting to see what comes back from the vendor quotes I put out for. Want to see it go in place within 60 days or so.

Anyway, i'm actively telling the IPMI to reboot but it's being a big of a jerk about it so I got that spamming the reboot for me. Should be roses shortly.

Francisco

Francisco · April 2017

@vimalware said:
This is where the good ole' warchest comes in handy.

Post network pron after everything is resolved.

For the router? Sure, we'll snag some pictures of that. Our network guy lives in Vegas now so I may see if he wants to go to the range and we take the SuperX along.

The MLX we'll sell/keep for parts since we just bought new blades for it the other week when we had the inter-lan connectivity issue.

Francisco

jetchirag · April 2017

If you could be bit more honest, what's better? LU 02 or LU 01?
because my one reseller on LU01 looks more stable

Francisco · April 2017

With that being said, things are golden again. I'll go change the sound for the buyshared alerts when I wake back up.

Sorry about that.

Francisco

lukehebb · April 2017

It did come back now MySQL is dead?

Francisco · April 2017

@jetchirag said:
If you could be bit more honest, what's better? LU 02 or LU 01?
because my one reseller on LU01 looks more stable

We've had kernel panics on both.

LU-Shared01 was a lot more stable webserver wise though since the litespeed graceful restart bug seemed to affect CentOS 7 a lot more than our CentOS 6 ones. We're talking CentOS 6 almost never had it happen and we were working on rebasing all the CentOS 7 nodes to 6 just to get that monkey off our back.

With 5.1.14 ( http://www.litespeedtech.com/products/litespeed-web-server/release-log ) they addressed this though and we've not had any alerts for any of the nodes webservers since.

We actively have 4 boxes on CentOS 7 and we'd have 3 - 4 notifications a day of small outages lasting 1 - 3 minutes from Litespeed restarting. They all stopped doing that at the same time with the mass update to 5.1.14.

This is a good thing since I was actively looking into alternatives like engintron which is an NGINX reverse proxy infront of Apache. It works good but litespeed's PHP is a lot faster than what Apache has.

Francisco

Francisco · April 2017

@lukehebb said:
It did come back now MySQL is dead?

Should be fine again, I gave it a kick to be safe and I'll stay up a bit longer to make sure it doesn't flip out. We've seen some strange things like this:

root@lu-shared02 [/usr/sbin]# ps -ef | grep mysql

mysql 134961 1 1 14:33 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe

mysql 135093 1 1 14:33 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe --basedir=/usr

mysql 135921 134961 53 14:33 ? 00:00:02 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/lib/mysql/lu-shared02.cpanelplatform.com.err --open-files-limit=50000 --pid-file=/var/lib/mysql/lu-shared02.cpanelplatform.com.pid

mysql 136004 135093 9 14:33 ? 00:00:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/lib/mysql/lu-shared02.cpanelplatform.com.err --open-files-limit=50000 --pid-

file=/var/lib/mysql/lu-shared02.cpanelplatform.com.pid

root 136366 17821 0 14:33 pts/0 00:00:00 grep --color=auto mysql

Where 2 copies of MySQL are run and we end up having file locking issues as they both try to access InnoDB. It doesn't cause any corruption but it sure makes a mess of things. I've read through the init scripts to see where it's making it do that, but haven't spotted anything fishy yet.

Anyway, I always just kill the basedir=/usr one and its sub mysqld process and it's good for months.

Francisco

Howdy, Stranger!

Categories

In this Discussion

BuyShared lu-shared02 down, anyone knows why?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

BuyShared lu-shared02 down, anyone knows why?

Comments