Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Intel Atom C2000 series dying at 18 months due to fault
New on LowEndTalk? Please Register and read our Community Rules.

Intel Atom C2000 series dying at 18 months due to fault

So, apparently Intel shat the bed with the C2000 series of Atom chips and are taking quite a financial hit as a result. The issue is that apparently after about 18 months they start failing to boot because they don't produce a clock signal:

AVR54. System May Experience Inability to Boot or May Cease Operation

Problem: The SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock

outputs) may stop functioning.

Implication: If the LPC clock(s) stop functioning the system will no longer be able to boot.

Workaround: A platform level change has been identified and may be implemented as a workaround for this erratum.

Cisco has announced a replacement program that is totally not a recall for their affected gear.

But there's an awful lot of those processors around elsewhere - including in LowEndDedis from a number of different providers.

Interesting times.

Thanked by 1Janevski
«13

Comments

  • oh my...

  • Online has more of those than anyone else I can think of.

  • RIP..

  • AlexBarakovAlexBarakov Member, Provider

    lbft said: 18 months they start failing to boot

    Ah well, guess I will never reboot my online.net dedi now.

  • My machine in co-lo is a c2750 and it's probably about that age.

  • Maybe I should move over to ubuntu and its livepatch system. Its free for private use from what I remember.

  • ClouviderClouvider Member, Provider
    edited February 2017

    @MagicalTrain said:
    Maybe I should move over to ubuntu and its livepatch system. Its free for private use from what I remember.

    CPU Microcode update is applied on reboot only AFAIK and even then Intel said it's a platform level workaround that has to be figured out and applied.

  • I was going to buy a few boards with c2000 chips in them a while back. Guess it is a good thing I didn't.

  • @Clouvider said:

    @MagicalTrain said:
    Maybe I should move over to ubuntu and its livepatch system. Its free for private use from what I remember.

    CPU Microcode update is applied on reboot only AFAIK and even then Intel said it's a platform level workaround that has to be figured out and applied.

    I was mostly talking about never ever having to reboot. (hopefully)

  • "Live fast, die young" - Intel Punk

  • K4Y5K4Y5 Member
    edited February 2017

    @Yura said:
    "Live fast, die young" - Intel Punk

    More like 'live young, die fast!'

    PS: Just took a full backup of my online C2750 / 8GB / 160GB SSD box and moved it to 3 servers for safe keeping :P

    Thanked by 1bugrakoc
  • what is going to happen with my websites hosted on online.net's c2750?
    I put all my eggs on that machine :(

  • @didtav said:
    what is going to happen with my websites hosted on online.net's c2750?
    I put all my eggs on that machine :(

    Thanked by 1mikho
  • @didtav said:
    what is going to happen with my websites hosted on online.net's c2750?

    You've got backups and are prepared for a recovery in any event, this just shortens the odds a little.

    Guess we, and probably online, will have to wait for announcements from supermicro to get a handle on this. Lots of FUD, very few facts as to the increase in statistical probability of a failure, which is not zero for any processor.

    The XC 2015 has been around for 24 months and LET isn't full of complaints over failed boxes (yet).

    Thanked by 1datanoise
  • That's really interesting. I had the C2550 fail in my NAS, many people thought it was because the BMC had a bug where the flash was getting written to it every second causing it to wear out. I wonder if it was actually this. Or worse - if it could actually suffer from both.

  • lbftlbft Member
    edited February 2017

    cochon said: Guess we, and probably online, will have to wait for announcements from supermicro to get a handle on this

    Online make their own hardware for the 2016 Atom-based products, which probably makes things a bit more complicated for them.

    very few facts as to the increase in statistical probability of a failure, which is not zero for any processor.

    We definitely need more information, there's no need for people to set their hair on fire. It may be made worse or better by particular factors, for example. But Intel was concerned enough to have "established a reserve to deal with that" that affected their datacenter division's financials and as mentioned above Cisco are already effectively recalling hardware.

    There's no need to go cancelling dedis and replacing colo'd hardware yet - it's something to keep an eye on and decide what to do as more information comes in (and I personally wouldn't be buying any used hardware using the affected models).

    FUD

    It is a bit shitty to imply that posting this thread is malicious.

  • didtav said: what is going to happen with my websites hosted on online.net's c2750? I put all my eggs on that machine :(

    Drive failures are way more common than CPU failures. You should have backups of your data. You should also know how much downtime you can live with in case of hardware failure (of any kind) and have a plan for how to restore from any failure in that time frame.

    For example, if you can live with a day or two of downtime if your server dies, then your plan might be to just buy another cheap server based on whatever's available when you need it, and restore your backups to it.

    But if you can't live with an hour's downtime, you should have low DNS TTLs and a hot spare server ready to take over at a moment's notice, and preferably some sort of automation to flip over to it in case the server dies while you're asleep.

  • @lbft said:
    It is a bit shitty to imply that posting this thread is malicious.

    Sorry, not sure why FUD implies malicious at all here, but I wasn't intentionally referring to OP specifically, more the general range of comments and lack of facts on Cisco and NAS forums too.

    But, my apologies to the OP if it was taken that way.

  • cochon said: Sorry, not sure why FUD implies malicious at all here

    Wiktionary calls it "derogatory", the Jargon File entry says that it typically refers to "any kind of disinformation used as a competitive weapon", having been closely associated with both IBM and Microsoft for doing that in the past; the Wikipedia entry is along similar lines.

    But, my apologies to the OP if it was taken that way.

    Sorry for responding like I did then, I should have asked for clarification rather than getting grumpy.

  • Have heard about it in the past, but suspect that this isn't really that common. Perhaps above normal for CPUs, but considering that the Atom C2000 series was released Q3 2013 (~3.5 years ago), if it were that big of a problem, you'd have heard a lot more about it sooner.

    Thanked by 1datanoise
  • My Online.net servers use those and I think the Scaleway baremetal too?

  • @sin said:
    My Online.net servers use those and I think the Scaleway baremetal too?

    Probably but look at the bright side, if you do have one suffer the problem and die it will be online.net who has to sort the replacement.

    If the data on it's important then you have a backup stored elsewhere right?

  • @dragon2611 said:

    @sin said:
    My Online.net servers use those and I think the Scaleway baremetal too?

    Probably but look at the bright side, if you do have one suffer the problem and die it will be online.net who has to sort the replacement.

    If the data on it's important then you have a backup stored elsewhere right?

    If you have data on it the disk will most likely remain unaffected anyway.

    Thanked by 1Maounique
  • @teamacc said:

    If you have data on it the disk will most likely remain unaffected anyway.

    That depends on the provider though, some will move the disk to another node for you and others may not.

    E.g if it's scaleway and the local SSD you are probably screwed.

  • sinsin Member
    edited February 2017

    dragon2611 said: Probably but look at the bright side, if you do have one suffer the problem and die it will be online.net who has to sort the replacement.If the data on it's important then you have a backup stored elsewhere right?

    Oh I'm not worried at all, I keep very good backups.

  • OK. So, anyone can explain what's going on here?

    I mean, this is a CPU, it has a bunch of transistors and all that. But... it has some kind of ROM and battery to store something? Or, how it would fail after exactly that time? I can't understand that

  • I got one online.net's c2750 last month.

    The server can not work in normal mode more than one day.Whether I use my own configuration or use their default installation and configuration,sometimes,it will crash when i downloading a big file,eg. a windows ISO.sometimes,it will down when i install a LNMP environment.

  • Intel took a note straight from HP- but can't math (as we saw with the P5):

  • live together die alone.

Sign In or Register to comment.