BuyVM

spazzo · November 2013

BuyVM support has been great to date,

storage-lv-01 has been playing up all week and now down again. Billing system down too

No response from support staff for over 3 hours.

Bad week?

Maounique · November 2013

http://buyvmstatus.com/
From what i see right now there are 7 nodes down with downtime ranging from 16 min to 2 hours and 38 minutes.
Many others have uptimes of less than a day, looks like some maintenance to me.

ZRBLOG · November 2013

routing problem i guess

hotsnow · November 2013

perhaps hardware issue, storsge-lv-01 have already down many times in these 3 days

Ruchirablog · November 2013

I guess some nodes are empty due to SSD upgrades. They are performing upgrades on empty nodes and also I remember fran mentioned that site reports many false positives. For example "manage" is the control panel and its up even though the site reports its down. buyvmstatus.com is a buyvm fan made site and its not official

hotsnow · November 2013

@Ruchirablog said:
I guess some nodes are empty due to SSD upgrades. They are performing upgrades on empty nodes and also I remember fran mentioned that site reports many false positives. For example "manage" is the control panel and its up even though the site reports its down. buyvmstatus.com is a buyvm fan made site and its not official

the "manage" is false positives indeed, but the storage-lv-01 really down many times in these 3 days, I have asked fran about this yesterday in the irc, I think he should know this issue.

MonsteR · November 2013

My buyvm box and their main website seem to be down, So just waiting here

vedran · November 2013

deadpool?

concerto49 · November 2013

Is this why the "other" forum is down!

terafire · November 2013

It's definitely a routing issue. Works from a few locations, doesn't on others.

terafire · November 2013

It's something to do with nlayer and CNservers

concerto49 · November 2013

@terafire said:
It's something to do with nlayer and CNservers

As in only for those with filtered IPs?

terafire · November 2013

Yeah, from what I've gathered.

MassNodes · November 2013

Guys.... chill i'm sure it has to do with them updating their servers with SSD.

mpkossen · November 2013

I'm giving this a shot, even though the chances are slim. @Francisco, @Aldryic.

jcaleb · November 2013

mine is working

hotsnow · November 2013

@MassNodes said:
Guys.... chill i'm sure it has to do with them updating their servers with SSD.

the ovz indeed in fact, but not include storage node, they haven't announced any ssd upgrades on the storage nodes.

mikho · November 2013

It's a routing issue wit cnservers (from irc channel)

hotsnow · November 2013

the issue of storage-lv-01 node which @spazzo mentioned at first is not the routing issue, this node crashed a little frequently indeed in these days, it must have some strange issue, I think Fran will figure out the reason asap.

Francisco · November 2013

CNServers seems to be having some sort of routing derp so the site can't be reached by everyone. I've already ticketed them to see if they forgot to include our subnets in changes again.

Storage01's been a bit of a mess alas. We originally pulled it down to deal with possibly bad RAM and it turned into a much bigger debugging case. It seems to finally be fine.

The manage alert is wrong since it's still pinging a very old IP that is no longer bound to the box. Servers marked offline are likely pending SSD upgrades. We've been rolling through servers as fiberhub has time to convert them. We can't do 'inplace' upgrades.

Either way, Aldryic just woke me a few minutes ago so give me a few to debug where things went tits up.

Francisco

Francisco · November 2013

TL;DR - Dammit HE!

Hello,

HE.net enabled RPF on our port last night due to a large attack originating from our network using spoofed IP's that I wasn't able to track down - I didn't realize it would impact you. If you can send me the prefixes that you are sending over CNServers, I'll have HE.net add exceptions for them while we sort out the rest of this mess.

--
Rob Tyree
Fiberhub Colocation & Internet Services

Should be all patched up soon.

Francisco

Maounique · November 2013

Francisco said: The manage alert is wrong since it's still pinging a very old IP that is no longer bound to the box.

That is strange since the downtime is only a few minutes usually but maybe that IP is coming alive at times.

MassNodes · November 2013

Maounique said: That is strange since the downtime is only a few minutes usually but maybe that IP is coming alive at times.

That doesn't make sense... sorry.

Maounique · November 2013

I mean, if the manage vm/node is shown as down due to the IP pinged by the monitoring site then it should be down for days/weeks/months unless the IP is allocated to something and replies from time to time.

Francisco · November 2013

Maounique said: I mean, if the manage vm/node is shown as down due to the IP pinged by the monitoring site then it should be down for days/weeks/months unless the IP is allocated to something and replies from time to time.

Unless VLD is doing something odd (or it's reporting back to him weird pings), the IP is not bound to anything and hasn't been for a couple months. It used to belong to our stallion1 deployment but I didn't need the IP bound to the new one so I didn't bother

Francisco

hotsnow · November 2013

thanks Fran, glad to hear the stroage-lv-01 finally be fine, hope this issue not happen again, lol

colm · November 2013

And it's down again...

With the crazy amount of downtime lately it seems prudent to migrate off of storage1.

Francisco · November 2013

@colm said:
And it's down again...

With the crazy amount of downtime lately it seems prudent to migrate off of storage1.

I'm not sure if you had a ping monitor or anything but for the past few hours it had been sitting at 500ms+ latency for no reason. There was no inbound flood, no outbound flood, nothing. It sat at ~3MB/sec outbound with some spikes inbound.

There was an EEPROM patch for the chipset the X9SCL's run. When we installed this board I assumed we had applied it in the past since this board/cpu/RAM came from the old KVM11; A very stable node that only ever threw up when a drive got thrown out.

With the latency how it was there was no point posting a maint time and waiting it out. I simply waiting a few minutes for the copy/paste to actually go through, patched the EEPROM's and sent it for a reboot.

At this point we'll see if that addresses things. The box itself was fine CPU & IOWAIT wise. CFQ was causing CPU spikes but that's just how it is; A swap of schedulers brought things back in line nicely.

We spun everyone up and am seeing network loads on it back to what they were already.

It's annoying and I truly apologize. You're welcome to ticket billing and we'll throw you a free month for the headaches. This one is on me in the end

Francisco

colm · November 2013

It's back now, and thanks for the update. I guess we'll see how long it lasts this time .

hotsnow · November 2013

@Francisco said:
It's annoying and I truly apologize. You're welcome to ticket billing and we'll throw you a free month for the headaches. This one is on me in the end

Fran, does this issue finally completely resolved?
my vps also on storage-lv-01, offline/online/offline/online ..., a dark week

Grande · November 2013

Bluevm sucks for me. They dont reply the tickets. Servers down or slow who cares about customers. They just reply in forums when someone post a negative reply for them and say we are sorry we are working to fix it bla bla. Bluevm is a scam for me.

Howdy, Stranger!

Categories

In this Discussion

BuyVM

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

BuyVM

Comments