ColoCrossing Random Routing - Normal?

Kris · February 2013

I just had all VPS servers in ColoCrossing Buffalo, NY lose power without any explanation. (3 at a single webhost who uses CC, all their servers went down)

When I asked my host what CC said, apparently, the "UPS dropped the critical load when it was switched to bypass mode for routine maintenance" - Sounds like bullshit to me. Their network never went down, nor did BuyVM. Sounds like my host may be on a rack by Jon without a battery backup unit ?

Always thought it was a little odd that Buffalo was never on CC's Data center page, and I'm guessing it's because they just ran fiber to their office building, and past Telia they were single-homed through GigLinx.

Beyond that, now it appears Cogent is coming into the mix, but in a very odd way.

After the outage of every server at CC Buffalo my provider had, I'm noticing now some machines route over Cogent, and some over Telia, subnet seems to have a factor.

198.23.x = Routes over Cogent always
192.210 = Routes over Telia always

I asked my provider to see why on the same IP on Comcast, within seconds, if BGP is setup properly, why are half my routes to CC Buffalo are going from:

Colorado -> SJ (Telia)-> CC Bufflao (90+ms) and the other half going over CogentCo (Denver -> Cleveland -> Buffalo) - 60 ms.

Shouldn't it always take Cogent? (as it's the best route, 30 ms less.) Or shouldn't it always take 1 route, not a different route based on the IP subnet ?

Thanks

TheHackBox · February 2013

They have very strange routes sometimes... I think someone is messing with routes on BGP.

Kris · February 2013

@TheHackBox said: They have very strange routes sometimes... I think someone is messing with routes on BGP.

Agreed. Jon said "this is normal" - I think it's not normal / they're doing live production testing on certain subnets.

Regardless Jon seems to be full of it in terms of the power failure, and in terms of the networking.

pubcrawler · February 2013

@Kris,

Sure the web host in the middle (if there is one) didn't have something else going wrong?

As far as power priority in essence, who knows. These folks like many keep what they have guarded and shrouded in secrecy. Could be a single power feed with no A-B power, could be the UPS overloaded (which is stupid).

Routine maintenance is often an excuse. Someone have a ticket to establish that window prior to the event?

As for their location, it's an office building with existing tenants - big building. They rent space / colo. Other providers are available at that location for upstream BW.

The split routes, are you seeing Cogent the entire way or just Cogent in the mix? I haven't seen Cogent to them, but I'll be looking for it.

Cogent isn't a direct peer, but being introduced by one of their upstreams. Look over their ASN info:
http://bgp.he.net/AS36352

Using Cogent always to improve the route? They can prioritize/redirect to that to clean up the route, but probably throws their commits and connections off balance as one upstream would be getting a good amount of priority suddenly. Certainly on the remote end or their end someone is monkeying with routing.

@Francisco, have input on the Cogent routes and prioritizing them?

Kris · February 2013

@pubcrawler said: Sure the web host in the middle (if there is one) didn't have something else going wrong?

I'm sure, I consult for them. Every box they had ( a lot ) in Buffalo went down, instant power failure. Had to reset / fix sysctl.conf on a box or two.

I've seen their huge Buffalo building, just an office building. It fully took cogent into CC. No mix really. Checked BGP, it isn't Giglinx, etc. terminates at a PSINet IP:

 6  he-3-4-0-0-cr01.denver.co.ibone.comcast.net (68.86.90.149)  21.700 ms  23.446 ms  24.153 ms
 7  te3-5.ccr01.den03.atlas.cogentco.com (154.54.10.33)  14.044 ms  16.964 ms  15.106 ms
 8  te8-3.ccr02.den01.atlas.cogentco.com (154.54.83.33)  115.683 ms
te8-3.ccr01.den01.atlas.cogentco.com (154.54.83.29)  126.147 ms
te7-1.ccr02.den01.atlas.cogentco.com (154.54.45.185)  76.676 ms
 9  te0-3-0-7.mpd21.mci01.atlas.cogentco.com (154.54.87.90)  34.270 ms
te0-3-0-7.ccr21.mci01.atlas.cogentco.com (154.54.87.86)  34.315 ms
te0-3-0-7.mpd22.mci01.atlas.cogentco.com (154.54.87.98)  32.543 ms
10  te0-3-0-7.ccr21.ord01.atlas.cogentco.com (154.54.84.74)  41.266 ms
te0-4-0-3.ccr22.ord01.atlas.cogentco.com (66.28.4.34)  40.367 ms
te0-5-0-4.ccr21.ord01.atlas.cogentco.com (154.54.45.146)  38.065 ms
11  te3-8.ccr02.cle04.atlas.cogentco.com (154.54.43.122)  196.281 ms
te7-2.ccr02.cle04.atlas.cogentco.com (154.54.45.133)  206.942 ms
te8-2.ccr02.cle04.atlas.cogentco.com (154.54.83.214)  216.405 ms
12  te8-2.ccr01.buf02.atlas.cogentco.com (154.54.31.238)  206.252 ms
te4-1.ccr01.buf02.atlas.cogentco.com (154.54.27.85)  205.714 ms
te7-2.ccr01.buf02.atlas.cogentco.com (154.54.44.81)  211.212 ms
13  38.122.36.46 (38.122.36.46)  65.057 ms  49.849 ms  49.031 ms
14  host.colocrossing.com (198.12.x.x)  49.301 ms  49.155 ms  49.434 ms
15  host.colocrossing.com (198.23.x.x)  60.006 ms  59.540 ms  59.626 ms

@pubcrawler said: Certainly on the remote end or their end someone is monkeying with routing.

That's what I figured. One subnet shouldn't always take Cogent directly in. Literally Comcast -> Cogent -> CC. (better route, 60ms vs 90)

But the bulk of the IPs still literally go from Colorado -> SJ -> Buffalo (90+ms) from Denver to Buffalo.

Just wanted to make sure I wasn't in the Twilight Zone / someone saw else didn't see this as "normal"

pubcrawler · February 2013

I'll look and run some routes.

Odd to see all Cogent. Never have seen this to Buffalo before.

Kris · February 2013

@pubcrawler said: Odd to see all Cogent. Never have seen this to Buffalo before.

Thanks, I'm confused - because I haven't ever either. It was only after the "random power outage during routine maintenance" (as per Jon, verbatim) did Cogent pop up in the mix. But "the network never went down"

Maybe it switched over to a backup router that had Cogent configured already? I wish I knew more about networking

Seems fishy as hell. I don't think the entire rack we're on has power backup, and now different routes on different IPs is "normal."

"I FEEL LIKE I'M TAKING CRAZY PILLS!"

pubcrawler · February 2013

My guess here is a funk by Telia.

Telia has Cogent in their mix. Around 7%. But odd to not see Telia at least handle the packets in public before the handoff.

Kris · February 2013

Yup, I would've expected a little hop between Comcast's IBone and Cogent's PoP in Denver.

6  he-3-4-0-0-cr01.denver.co.ibone.comcast.net (68.86.90.149)  21.700 ms  23.446 ms  24.153 ms
7  te3-5.ccr01.den03.atlas.cogentco.com (154.54.10.33)  14.044 ms  16.964 ms  15.106 ms

pcan · February 2013

My Urpad VPS in ColoCrossing Buffalo went down for two hours today and came up with a reboot. It sure could be a power issue.
That reminds me a UPS I had; it failed precisely this way: no fault indication of any kind, but load dropped when I pressed the front panel test button. At the time, the A+B power on the server saved my day.

Kris · February 2013

@pcan said: My Urpad VPS in ColoCrossing Buffalo went down for two hours today and came up with a reboot. It sure could be a power issue.

So I wonder how many racks were affected by the "standby UPS power issue"

Jon had the nerve to tell my provider that he should have had A+B power on the servers (that he leases RTO from them, their configuration)

My feeling is, shouldn't need a 2nd power supply because of a lack of (or a poor quality) UPS. That 1st power feed shouldn't get cut. Screams lack of UPS all around.

So the cracks begin to show at CC's "flagship" location that just doesn't seem to be on their page.

pubcrawler · February 2013

The dual power issue is a sore spot when you are renting from a facility.

Is there even an option from the purchase / checkout to add A+B power? I doubt there is.

Kris · February 2013

@pubcrawler said: Is there even an option from the purchase / checkout to add A+B power? I doubt there is.

Nope, but we were grilled for having so many servers with them but not using it. What an idiot. Tried to turn the tables on us, didn't even mention it.

pubcrawler · February 2013

That's a f!ck head move on his part.

Love when providers pass the blame. Tell him to man up and clean up.

Ask him where the A+B power option is Hell better spread the word, Buffalo is starting to show power issues. Wonder how much interest he'll get in redundant power now?

Maybe this is just another fund raiser? Better watch or I'll need DDoS protection and solicitation from you know who.

Damian · February 2013

@Kris said: So the cracks begin to show at CC's "flagship" location that just doesn't seem to be on their page.

I thought the 'flagship' location was Chicago?

pubcrawler · February 2013

PS: real surprised more folks on VPS in Buffalo didn't notice / not on here asking about outage.

Spencer · February 2013

@Damian said: I thought the 'flagship' location was Chicago?

I think it was now it is Buffalo.

unused · February 2013

What is the actual subnet involved here? 192.210.x isn't enough to look into.

From http://bgp.he.net/AS36352#_prefixes you can see they are advertising subnets as small as /24 into the routing table. And they peer with cogent directly, at least for some of those.

You can easily check how they are announcing your subnet from route-views vs. others and then ask them why. It could be a specific routing policy, or something automated if they are using an FCP. The inbound route may also have nothing to do with how they prefer traffic back outbound.

KernelSanders · February 2013

@Damian - DuPont Fabros ain't even in Chicago. It's a 30-45 minute drive northwest of Chicago in Elk Grove Village.

Kris · February 2013

@Spencer said: @Damian said: I thought the 'flagship' location was Chicago?

I think it was now it is Buffalo.

Maybe in the past Chicago was, but I think they ran fiber to their employee / local office and went balls out on setting up a DC at their downtown Buffalo office location, and since have been pushing / making deals left right and center filling it up.

Never heard about these A+B power options till now. Odd.

Kris · February 2013

@unused said: What is the actual subnet involved here? 192.210.x isn't enough to look into.

192.210.234.x Was an IP that takes straight Cogent. I figured it was enough info for people with CC IPs / VPS's (who doesn't at this point) to check things out.

CVPS_Chris · February 2013

@Damian said: I thought the 'flagship' location was Chicago?

Damian is correct, Chicago is still a much larger location for ColoCrossing.

As far as what happened this morning, the DC was doing upgrades and messed something up regarding the power. It had nothing to do with ColoCrossing directly and not everyone was effected. Only half my nodes were for a period of about 1 hour.

@Kris, everything you are saying is false about ColoCrossing and the DC situation.

Damian · February 2013

@pubcrawler said: PS: real surprised more folks on VPS in Buffalo didn't notice / not on here asking about outage.

I'm thinking not all of Buffalo had an outage. We didn't go down, BuyVM didn't go down, CVPS didn't go down... maybe only the provider that @Kris mentioned had an issue?

pubcrawler · February 2013

That's funny @KernelSanders. Everyone pimps DuPont Fabros like it's in Downtown. But then again, much safer out there.

@unused, good info.

@Kris, A+B power is old school. Just costly so hardly anyone in this market segment does it.

Kris · February 2013

@CVPS_Chris said: @Kris, everything you are saying is false about ColoCrossing and the DC situation.

Waiting on Jon's reply for what really happened. I don't remember getting the upgrade announcements.

I'd love to see / hear otherwise than "they were just bein' stupid I guess" - I think some racks are "prioritized"

KernelSanders · February 2013

@pubcrawler yeah definitely not downtown by a long shot. But downtown Chicago is still safe, just don't go to the south side past US Cellular field unless you're in a tank

Kris · February 2013

@Damian said: I'm thinking not all of Buffalo had an outage. We didn't go down, BuyVM didn't go down, CVPS didn't go down... maybe only the provider that @Kris mentioned had an issue?

Someone with URPAD Buffalo was down 2 hours with a server rebooted.

The host I'm with has 6 servers at CC, all went down at the same time.

@pubcrawler said: @Kris, A+B power is old school. Just costly so hardly anyone in this market segment does it.

I remember using it with my PowerEdges at NAC Cedar Knolls, but I don't see the reason for a A+B power setup if erm... A has stable power.

Maybe I'm just used to more "production" / commercial data centers with things such as planned maintenance.

CVPS_Chris · February 2013

@Kris said: Waiting on Jon's reply for what really happened. I don't remember getting the upgrade announcements.

I'd love to see / hear otherwise than "they were just bein' stupid I guess" - I think some racks are "prioritized"

Are you a direct client? Like I just said, only "some" racks were effected not all so the suggestion that some racks are prioritized is just buffoonery.

@Kris said: Maybe I'm just used to more "production" / commercial data centers with things such as planned maintenance.

lol.

@KernelSanders said: @pubcrawler yeah definitely not downtown by a long shot.

DTF is one of the best public DC's in the US. It was worth the move IMO.

pubcrawler · February 2013

All fairness, @Kris, some things will always be prioritized. Core routers, high paying clients, etc. Someone did multiple bad things there this morning. Power issue is nuts since power can absolutely destroy stuff.

If uptime is necessary, A+B power sales inquiry. Although, I wouldn't exactly be renting for such redundancy.

Kris · February 2013

@CVPS_Chris said: Are you a direct client? Like I just said, only "some" racks were effected not all so the suggestion that some racks are prioritized is just buffoonery.

I work for a direct client, that's all you have to worry about. Oh wait, I forgot you're essentially CC / vice versa.

The fact that some racks didn't go down, others did would make me think some are more prioritized.

I'd have some input into the power situation if they bothered to put any of the stats on their webpage.

Or is that like your site that's "coming soon" ?

pubcrawler · February 2013

Remains sore spot that the community's Colocrossing spokesperson is a "reseller" who claims no affiliation with CC.

Howdy, Stranger!

Categories

In this Discussion

ColoCrossing Random Routing - Normal?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

ColoCrossing Random Routing - Normal?

Comments