New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Hetzner Motherboard Replacements for AX42, AX52 and AX102

in General
Over the last few weeks, there have unfortunately been an increasing number of unexplained incidents with mainboards on a number of servers from these models: AX42, AX52 and AX102. After a detailed analysis of the incidents which we did in cooperation with the manufacturer, it became clear that there is a design error on the mainboard for these models.
See: https://status.hetzner.com/incident/7fae9cca-b38c-4154-8a27-14e6dfea5c1e for the full details.
Tagging @Hetzner_OL as well just in case they can add any more detail or other useful inside information.
Comments
I know @labze (HostBrr) has also had a tough time with hardware issues with some of the AMDs as part of their offerings. Not sure if they're similar/shared boards. Maybe he can add some of his own observations as well.
Tagging @labze for fun as he probably hates those mobos too :-D
asrock rack?
Does anyone happen to know which brand boards are these? Just out of curiosity.
Aren't they using custom motherboards?
Didn't know that.
most probably Asrock Rack yes, but custom design from the manufacture directly. however, custom design doesn't mean, that design errors are only in mass production...
Asrock B665D4U-1L
https://browser.geekbench.com/search?q=B665D4U-1L
Having used the AX102 and AX52, I remember that the AX102 seems to use an ASRock motherboard, but I haven't checked the AX52.
That's a brand @labze is intimately familiar with though I'm not sure on the exact model that was in use in HostBrr (vs Hetzner).
I know there has been quite a struggle for him personally with the downtimes and crashes but I guess it required something much larger like a Hetzner (in terms of machines/scale) to be able to get the manufacturer to admit (have they?) the issue.
Apparently this is an ongoing issue since May 2024 (based on first report date)
https://forum.asrock.com/forum_posts.asp?TID=40795&PID=156897&title=post-code-00#156897
They are indeed using AsRock Rack (we had some AX102). OVH is using MSI Boards for their EPYC 4004 / Ryzen 9000 Dedicated Servers.
Wow, no one wants to be a Hetzner technician right now, they will probably be able to replace the motherboard with his eyes closed.
Tagging myself @davide, the first generation AMD Ryzen CPU that I bought last month has a known hardware bug that corrupts the registers once a day: https://lowendtalk.com/discussion/201648/asus-ryzen-ecc-unstable.
Had the same issue few weeks ago with an AX42 server.
Motherboard dead issued an ticket and it was resolved within 30 min
Afaik there's a firmware/bios patch released to fix this issues.
It’s a miracle to find anything server grade at Hetzner other than the (insanely priced) branded range. Assrock rack doesn’t count
Closest I ever saw was a gen8 xeon on a Fujitsu workstation board built to server standards (without rotated ram and ipmi)
As far as popular wisdom goes across the internet, it's a hardware bug; AMD never spoke a word about the defect. I haven't been able to stop the crashes with the latest microcode & latest bios. A second generation Ryzen should be in the mail tomorrow.
I know RAM and CPU is not server range for most of their cheap servers, but is it the same for their HDD / SSDs?
I haven't been their customer for a long time.
All their HDD's are enterprise grade (24/7, 5 year warranty).
Both SSD's and NVMe's can be either consumer or enterprise. It's quite safe to determine type based on just size. 480, 960 etc. is enterprise, 256, 512, 1 etc. is consumer grade.
Depends. EX44 consumer, EX101 enterprise for example.
I had 10 or so AX102 servers at Hetzner and they was all running rock solid. I believe the servers were delivered with a mix of ASRock and ASUS motherboards. The ASRock I believe is a custom board based on the B650D4U model and the ASUS is something gaming motherboard.
But I certainly have had my issues with ASRock B650D4U at other datacenters. I have 11 7950XD systems at IP-Projects and 8 or 9 systems had the motherboard fail. Similar issue have happened with dataforest
I also just had a second motherboard failure at Contabo in the same system.
It is likely the same issue affecting Hetzner. it is incredible how one board through several revisions can have such major issues.
I wonder if ASUS gaming one had RGB on it or not.
Wow, hetzner is absolutely crazy and awesome for replacing them proactively.
It does. How else would it be so fast?
That one is crashing here as well (at a different, well known US provider).
The ASRock B450D4U instead is stable (at least at Hetzner), while I'm also seeing those problems w/ the Asustek Pro WS 565-ACE at Hetzner (used in the AX101).
(If we are talking about no logged errors hard reboots here)
I'm also not seeing any of those problems with the ASRock B565D4-V1L, but my statistical range here is probably too small for general assumptions.
I have full RGB in my 2U rackmount. The chassis is fully enclosed so you can't see anything from the outside, but I know the RGB is making my server faster.
It seems that I am lucky. Although I saw on the console that my server was also among those to be replaced, my AX42 and AX102 have never had any problems.
It's a time bomb, we run multiple B650D4U and they show issues after 1 / 3 / 6 months...
You cannot keep these in production, we are also replacing them.. with Supermico Microclouds..
Asrock Rack has confirmed to us there was a particular production run where they used a BIOS chip/component from a certain vendor that is causing a majority of the stability issues on the B450D4U boards, boards impacted seem to be from around Q1 to Q3 2024. They've been responsive and willing to RMA any board impacted.
We've seen no stability problems on the newer replacement boards. Supermicro AM5 variants are fine with zero issues reported.