http://buyvmstatus.com/
From what i see right now there are 7 nodes down with downtime ranging from 16 min to 2 hours and 38 minutes.
Many others have uptimes of less than a day, looks like some maintenance to me.
I guess some nodes are empty due to SSD upgrades. They are performing upgrades on empty nodes and also I remember fran mentioned that site reports many false positives. For example "manage" is the control panel and its up even though the site reports its down. buyvmstatus.com is a buyvm fan made site and its not official
@Ruchirablog said:
I guess some nodes are empty due to SSD upgrades. They are performing upgrades on empty nodes and also I remember fran mentioned that site reports many false positives. For example "manage" is the control panel and its up even though the site reports its down. buyvmstatus.com is a buyvm fan made site and its not official
the "manage" is false positives indeed, but the storage-lv-01 really down many times in these 3 days, I have asked fran about this yesterday in the irc, I think he should know this issue.
the issue of storage-lv-01 node which @spazzo mentioned at first is not the routing issue, this node crashed a little frequently indeed in these days, it must have some strange issue, I think Fran will figure out the reason asap.
CNServers seems to be having some sort of routing derp so the site can't be reached by everyone. I've already ticketed them to see if they forgot to include our subnets in changes again.
Storage01's been a bit of a mess alas. We originally pulled it down to deal with possibly bad RAM and it turned into a much bigger debugging case. It seems to finally be fine.
The manage alert is wrong since it's still pinging a very old IP that is no longer bound to the box. Servers marked offline are likely pending SSD upgrades. We've been rolling through servers as fiberhub has time to convert them. We can't do 'inplace' upgrades.
Either way, Aldryic just woke me a few minutes ago so give me a few to debug where things went tits up.
HE.net enabled RPF on our port last night due to a large attack originating from our network using spoofed IP's that I wasn't able to track down - I didn't realize it would impact you. If you can send me the prefixes that you are sending over CNServers, I'll have HE.net add exceptions for them while we sort out the rest of this mess.
--
Rob Tyree
Fiberhub Colocation & Internet Services
I mean, if the manage vm/node is shown as down due to the IP pinged by the monitoring site then it should be down for days/weeks/months unless the IP is allocated to something and replies from time to time.
Maounique said: I mean, if the manage vm/node is shown as down due to the IP pinged by the monitoring site then it should be down for days/weeks/months unless the IP is allocated to something and replies from time to time.
Unless VLD is doing something odd (or it's reporting back to him weird pings), the IP is not bound to anything and hasn't been for a couple months. It used to belong to our stallion1 deployment but I didn't need the IP bound to the new one so I didn't bother
With the crazy amount of downtime lately it seems prudent to migrate off of storage1.
I'm not sure if you had a ping monitor or anything but for the past few hours it had been sitting at 500ms+ latency for no reason. There was no inbound flood, no outbound flood, nothing. It sat at ~3MB/sec outbound with some spikes inbound.
There was an EEPROM patch for the chipset the X9SCL's run. When we installed this board I assumed we had applied it in the past since this board/cpu/RAM came from the old KVM11; A very stable node that only ever threw up when a drive got thrown out.
With the latency how it was there was no point posting a maint time and waiting it out. I simply waiting a few minutes for the copy/paste to actually go through, patched the EEPROM's and sent it for a reboot.
At this point we'll see if that addresses things. The box itself was fine CPU & IOWAIT wise. CFQ was causing CPU spikes but that's just how it is; A swap of schedulers brought things back in line nicely.
We spun everyone up and am seeing network loads on it back to what they were already.
It's annoying and I truly apologize. You're welcome to ticket billing and we'll throw you a free month for the headaches. This one is on me in the end
@Francisco said:
It's annoying and I truly apologize. You're welcome to ticket billing and we'll throw you a free month for the headaches. This one is on me in the end
Fran, does this issue finally completely resolved?
my vps also on storage-lv-01, offline/online/offline/online ..., a dark week
Bluevm sucks for me. They dont reply the tickets. Servers down or slow who cares about customers. They just reply in forums when someone post a negative reply for them and say we are sorry we are working to fix it bla bla. Bluevm is a scam for me.
Comments
http://buyvmstatus.com/
From what i see right now there are 7 nodes down with downtime ranging from 16 min to 2 hours and 38 minutes.
Many others have uptimes of less than a day, looks like some maintenance to me.
routing problem i guess
perhaps hardware issue, storsge-lv-01 have already down many times in these 3 days
I guess some nodes are empty due to SSD upgrades. They are performing upgrades on empty nodes and also I remember fran mentioned that site reports many false positives. For example "manage" is the control panel and its up even though the site reports its down. buyvmstatus.com is a buyvm fan made site and its not official
the "manage" is false positives indeed, but the storage-lv-01 really down many times in these 3 days, I have asked fran about this yesterday in the irc, I think he should know this issue.
My buyvm box and their main website seem to be down, So just waiting here
deadpool?
Is this why the "other" forum is down!
It's definitely a routing issue. Works from a few locations, doesn't on others.
It's something to do with nlayer and CNservers
As in only for those with filtered IPs?
Yeah, from what I've gathered.
Guys.... chill i'm sure it has to do with them updating their servers with SSD.
I'm giving this a shot, even though the chances are slim. @Francisco, @Aldryic.
mine is working
the ovz indeed in fact, but not include storage node, they haven't announced any ssd upgrades on the storage nodes.
It's a routing issue wit cnservers (from irc channel)
the issue of storage-lv-01 node which @spazzo mentioned at first is not the routing issue, this node crashed a little frequently indeed in these days, it must have some strange issue, I think Fran will figure out the reason asap.
CNServers seems to be having some sort of routing derp so the site can't be reached by everyone. I've already ticketed them to see if they forgot to include our subnets in changes again.
Storage01's been a bit of a mess alas. We originally pulled it down to deal with possibly bad RAM and it turned into a much bigger debugging case. It seems to finally be fine.
The manage alert is wrong since it's still pinging a very old IP that is no longer bound to the box. Servers marked offline are likely pending SSD upgrades. We've been rolling through servers as fiberhub has time to convert them. We can't do 'inplace' upgrades.
Either way, Aldryic just woke me a few minutes ago so give me a few to debug where things went tits up.
Francisco
TL;DR - Dammit HE!
Hello,
HE.net enabled RPF on our port last night due to a large attack originating from our network using spoofed IP's that I wasn't able to track down - I didn't realize it would impact you. If you can send me the prefixes that you are sending over CNServers, I'll have HE.net add exceptions for them while we sort out the rest of this mess.
--
Rob Tyree
Fiberhub Colocation & Internet Services
Should be all patched up soon.
Francisco
That is strange since the downtime is only a few minutes usually but maybe that IP is coming alive at times.
That doesn't make sense... sorry.
I mean, if the manage vm/node is shown as down due to the IP pinged by the monitoring site then it should be down for days/weeks/months unless the IP is allocated to something and replies from time to time.
Unless VLD is doing something odd (or it's reporting back to him weird pings), the IP is not bound to anything and hasn't been for a couple months. It used to belong to our stallion1 deployment but I didn't need the IP bound to the new one so I didn't bother
Francisco
thanks Fran, glad to hear the stroage-lv-01 finally be fine, hope this issue not happen again, lol
And it's down again...
With the crazy amount of downtime lately it seems prudent to migrate off of storage1.
I'm not sure if you had a ping monitor or anything but for the past few hours it had been sitting at 500ms+ latency for no reason. There was no inbound flood, no outbound flood, nothing. It sat at ~3MB/sec outbound with some spikes inbound.
There was an EEPROM patch for the chipset the X9SCL's run. When we installed this board I assumed we had applied it in the past since this board/cpu/RAM came from the old KVM11; A very stable node that only ever threw up when a drive got thrown out.
With the latency how it was there was no point posting a maint time and waiting it out. I simply waiting a few minutes for the copy/paste to actually go through, patched the EEPROM's and sent it for a reboot.
At this point we'll see if that addresses things. The box itself was fine CPU & IOWAIT wise. CFQ was causing CPU spikes but that's just how it is; A swap of schedulers brought things back in line nicely.
We spun everyone up and am seeing network loads on it back to what they were already.
It's annoying and I truly apologize. You're welcome to ticket billing and we'll throw you a free month for the headaches. This one is on me in the end
Francisco
It's back now, and thanks for the update. I guess we'll see how long it lasts this time
.
Fran, does this issue finally completely resolved?
my vps also on storage-lv-01, offline/online/offline/online ..., a dark week
Bluevm sucks for me. They dont reply the tickets. Servers down or slow who cares about customers. They just reply in forums when someone post a negative reply for them and say we are sorry we are working to fix it bla bla. Bluevm is a scam for me.