All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Intel Atom C2000 series dying at 18 months due to fault
So, apparently Intel shat the bed with the C2000 series of Atom chips and are taking quite a financial hit as a result. The issue is that apparently after about 18 months they start failing to boot because they don't produce a clock signal:
AVR54. System May Experience Inability to Boot or May Cease Operation
Problem: The SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock
outputs) may stop functioning.
Implication: If the LPC clock(s) stop functioning the system will no longer be able to boot.
Workaround: A platform level change has been identified and may be implemented as a workaround for this erratum.
Cisco has announced a replacement program that is totally not a recall for their affected gear.
But there's an awful lot of those processors around elsewhere - including in LowEndDedis from a number of different providers.
Interesting times.
Comments
Shit.
oh my...
Online has more of those than anyone else I can think of.
RIP..
Ah well, guess I will never reboot my online.net dedi now.
My machine in co-lo is a c2750 and it's probably about that age.
Maybe I should move over to ubuntu and its livepatch system. Its free for private use from what I remember.
CPU Microcode update is applied on reboot only AFAIK and even then Intel said it's a platform level workaround that has to be figured out and applied.
I was going to buy a few boards with c2000 chips in them a while back. Guess it is a good thing I didn't.
I was mostly talking about never ever having to reboot. (hopefully)
"Live fast, die young" - Intel Punk
More like 'live young, die fast!'
PS: Just took a full backup of my online C2750 / 8GB / 160GB SSD box and moved it to 3 servers for safe keeping :P
what is going to happen with my websites hosted on online.net's c2750?
I put all my eggs on that machine
You've got backups and are prepared for a recovery in any event, this just shortens the odds a little.
Guess we, and probably online, will have to wait for announcements from supermicro to get a handle on this. Lots of FUD, very few facts as to the increase in statistical probability of a failure, which is not zero for any processor.
The XC 2015 has been around for 24 months and LET isn't full of complaints over failed boxes (yet).
That's really interesting. I had the C2550 fail in my NAS, many people thought it was because the BMC had a bug where the flash was getting written to it every second causing it to wear out. I wonder if it was actually this. Or worse - if it could actually suffer from both.
Online make their own hardware for the 2016 Atom-based products, which probably makes things a bit more complicated for them.
We definitely need more information, there's no need for people to set their hair on fire. It may be made worse or better by particular factors, for example. But Intel was concerned enough to have "established a reserve to deal with that" that affected their datacenter division's financials and as mentioned above Cisco are already effectively recalling hardware.
There's no need to go cancelling dedis and replacing colo'd hardware yet - it's something to keep an eye on and decide what to do as more information comes in (and I personally wouldn't be buying any used hardware using the affected models).
It is a bit shitty to imply that posting this thread is malicious.
Drive failures are way more common than CPU failures. You should have backups of your data. You should also know how much downtime you can live with in case of hardware failure (of any kind) and have a plan for how to restore from any failure in that time frame.
For example, if you can live with a day or two of downtime if your server dies, then your plan might be to just buy another cheap server based on whatever's available when you need it, and restore your backups to it.
But if you can't live with an hour's downtime, you should have low DNS TTLs and a hot spare server ready to take over at a moment's notice, and preferably some sort of automation to flip over to it in case the server dies while you're asleep.
Sorry, not sure why FUD implies malicious at all here, but I wasn't intentionally referring to OP specifically, more the general range of comments and lack of facts on Cisco and NAS forums too.
But, my apologies to the OP if it was taken that way.
Wiktionary calls it "derogatory", the Jargon File entry says that it typically refers to "any kind of disinformation used as a competitive weapon", having been closely associated with both IBM and Microsoft for doing that in the past; the Wikipedia entry is along similar lines.
Sorry for responding like I did then, I should have asked for clarification rather than getting grumpy.
Have heard about it in the past, but suspect that this isn't really that common. Perhaps above normal for CPUs, but considering that the Atom C2000 series was released Q3 2013 (~3.5 years ago), if it were that big of a problem, you'd have heard a lot more about it sooner.
My Online.net servers use those and I think the Scaleway baremetal too?
Probably but look at the bright side, if you do have one suffer the problem and die it will be online.net who has to sort the replacement.
If the data on it's important then you have a backup stored elsewhere right?
If you have data on it the disk will most likely remain unaffected anyway.
That depends on the provider though, some will move the disk to another node for you and others may not.
E.g if it's scaleway and the local SSD you are probably screwed.
Oh I'm not worried at all, I keep very good backups.
OK. So, anyone can explain what's going on here?
I mean, this is a CPU, it has a bunch of transistors and all that. But... it has some kind of ROM and battery to store something? Or, how it would fail after exactly that time? I can't understand that
I got one online.net's c2750 last month.
The server can not work in normal mode more than one day.Whether I use my own configuration or use their default installation and configuration,sometimes,it will crash when i downloading a big file,eg. a windows ISO.sometimes,it will down when i install a LNMP environment.
Intel took a note straight from HP- but can't math (as we saw with the P5):
live together die alone.