Delimiter Atom Down

klpowell · April 2015

My Atom has been down for ~6 hours. I opened a ticket and was told to check Network Status which hasn't been updated since !!:18am EST that simply says investigating. I'm wondering how wide spread this issue is. Anyone else have one down?

SSDBlaze · April 2015

6 hours isn't horrible,

Im sure they will have it up within 12 hours. Usually providers don't let their services down longer than that.

MikePT · April 2015

@MarkTurner will chime in!

klpowell · April 2015

It came up literally after I clicked post! The power of LowEndTalk I guess.

SSDBlaze · April 2015

@klpowell said:
It came up literally after I clicked post! The power of LowEndTalk I guess.

haha.

Yep, LET has supernatural powers.

Nomad · April 2015

What about the power of impatience?

6 hours is way too quick especially when they told there's a network related issue.

SSDBlaze · April 2015

@Nomad said:
What about the power of impatience?

6 hours is way too quick especially when they told there's a network related issue.

6 Hours can feel really long if you planned your day to be doing work on it.

You have a point though, I'd say 12 hours is the time to start posting.

klpowell · April 2015

They didn't tell anything. They said "investigating." I'm ok with downtime as long as there is communication. Not updating a Network Status page and answer a ticket with "read the network Status page." Is not communication. Six hours is the most downtime I have had with any of the providers I use for quite some time.

klpowell · April 2015

Which btw Network status page still not updated.

Atla03 - Atlanta - Some Atom Servers offline (Reported)

Affecting Server - Yomura Master Provisioning | Priority - Critical

We're investigating loss of connectivity for some Atom servers in our Atlanta datacentre.

Date - 04/24/2015 11:17

Last Updated - 04/24/2015 11:18

Nomad · April 2015

So... It's still under investigation right?

"I've no further questions your honor"

klpowell · April 2015

Been up for 30 min so who knows...

Listen, I have been extremely happy with Delimiter. This little Atom has far exceeded my expectations. But, my point here is why is communication so difficult for companies in the low-end market? In my professional career working in an IT dept for a Fortune 500 corporation this was always one of my top priorities, take a moment and update your clients on the status of their issue. I've had clients that asked for an update ever 15 minutes, was that a pain? Yes, Did that lead to a slow down in restoration? Yes, but was the client happy? Yes. In my own hosting career I have always taken the time to update clients when there is an issue. I don't care if they paid me $1 or $1,000 for the service, it isn't to hard to update. A "Still working on it" gives customers a much greater ease that their problem is being taken care. Over 6 hours, I would hope to have at least 2 updates on a Network downtime. This is not anything against Delimiter, but just an industry issue. I've had much higher dollar host than this that has failed at communication as well.

MarkTurner · April 2015

Its was not Delimiter's fault but our fault. They are normally very good at disseminating information via their network issues page and their three push-to-handset services.

We have only given them limited updates today as we have been waiting for Force10's engineering to help handle a replacement card. Unfortunately the spare card that was used to replace the failed one had the wrong firmware. Force10 is especially pernickety when it comes to individual card firmware upgrades.

Its often easier to leave Force10 to handle these types of jobs, otherwise we'd have another 100+ ports down which would be another calamity.

Anyway the card was upgraded in an unused chassis finally and then installed back into the aggregate switch used for the rack with those Atoms in it.

klpowell · April 2015

Thanks @MarkTurner sounds like a fun day! Seems to be working find on my end now.

Bruce · April 2015

one of my atoms is still down. might be some residual issues in the DC

update: all good now

klpowell · April 2015

Mine had an odd reboot after it came up, but has been stable since then.

lazyt · April 2015

Guess mine was in that rack as well. It kind of shocked me when the monitoring showed it was down.

linuxthefish · April 2015

Rebooted 7:55 hours ago, but fine now.

Bruce · April 2015

anyone else have problems with their atoms not booting automatically?

this is a list of outages reported on my atom nodes @ Delimiter (past 12 months).

2015-04-24 17:39:36  
2015-04-18 12:36:50  
2014-10-29 12:52:29  
2014-10-26 00:46:44  
2014-10-19 11:10:51 *
2014-09-27 11:23:27 *
2014-09-17 21:00:02  
2014-09-16 13:21:12  
2014-09-03 22:13:27  
2014-08-28 19:58:41 *
2014-07-18 12:58:35 *
2014-07-14 21:38:25  

 * not all nodes affected

some of these were nodequery issues. some were brief transit issues. but sometimes all my atoms went down. when that happened they needed to be power cycled. different to my blades in the same DC, which I've never had to power cycle.

anyone else have to power cycle their atoms after an outage? if we all have similar experience, then maybe it's something Delimiter can tweak (or perhaps have just resolved with a new card)

Note: not a complaint. very happy with these $5/m servers. just interested if others have same "feature"

ub3rstar · April 2015

I was going to renew my Atom server but then it was down not once, not twice, not even three times, four times this week. Besides that though, I didn't mind the service and their customer support wasn't bad either. The other thing too though is a dedicated server was too much for what I was running, so I downsized to a VPS instead.

@Bruce said:

anyone else have to power cycle their atoms after an outage? if we all have similar experience, then maybe it's something Delimiter can tweak (or perhaps have just resolved with a new card)

I believe Delimiter uses SAN for their Atom servers and because of that, whenever a drive in the RAID array fails and is replaced, the filesystem becomes readonly until the OS is rebooted. So I wouldn't call it an "outage". There's also pros and cons to have everyone's servers power cycled automatically.

lazyt · April 2015

I've had one time where I had to power cycle my Atom there. Other then that it has worked surprisingly well.

For the price I would rather use it then a VPS. It's handling a couple of moderate sized forums quite well.

MarkTurner · April 2015

ub3rstar said: whenever a drive in the RAID array fails and is replaced, the filesystem becomes readonly until the OS is rebooted

That defies the definition of a RAID and completely untrue.

The NAS used is a NetApp storage system. These units are built for enterprise level reliability and have the price tag to prove it.

The issue that bites these Atom servers is the ISCSI implementation. This last outage which has been the first for nearly a year was caused by a switch card failure. These things happen, they get replaced and things trundle on.

Of course along the way a few atoms would go offline from time to time without an inherent reason, often just ISCSI daemon on the server failure, OOM killing iscsid, reloading iptables and not ensuring that iscsi ports were left open, processor saturation causing ISCSI to fallover and so on.

The culprit in 99% of cases is iscsid falling over.

William · April 2015

and have the price tag to prove it.

Eh, not really, in my exp Dell and HP worked far better for much less money.

Bruce · April 2015

@MarkTurner said:
The culprit in 99% of cases is iscsid falling over.

any suggestions on the best way to deal with this? my atoms stop responding, so power cycle is the only solution at the moment. if they are alive, but no disk, is it a matter of running a cron to check status and restart the service if not?

MarkTurner · April 2015

@Bruce - if you restart iscsid then you'll turn your filesystem read only.

From my experience with two of them over the past 18 months, Centos 6 seems a lot more robust than Ubuntu for ISCSI root. This is not a scientific observation but just a comparison of two Atoms on the same pod, same rack, same switch, etc but one with Centos 6 and the other Ubuntu 12.04.

Howdy, Stranger!

Categories

In this Discussion

Delimiter Atom Down

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Delimiter Atom Down

Comments