New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Comments
I believe this is part of their "Fresh Air" cooling, it is more energy efficient to not cool the data centre floor to such low levels, even at the cost of a bit higher fan speed in the servers. Our SR635s are configured to allow up to 30C, and the datacentre we're in cools the floor to 20-25C without completely isolated hot/cold isles.
But with the right configuration, the servers we're using allow even up to 45C, as part of ASHRAE Class A4.
@Kodomu
In my experience, things like “Fresh Air” or whatever other marketing gymics Dell comes up with feel a bit outdated BS Marketing.
If the CPU is running hot, just imagine what temperatures the VRMs are.
Either-way, I think we got the situation under control with the last reboot, but we will monitor this closely for the next days.
PS: I do have to acknowledge that the cooling is Top-Noch in that DC
I'd probably largely agree, companies are a big fan of anything like this, especially something that can appeal to bean counters thinking they can save big money on cooling. The reality is the datacentre would need to be designed with this in mind, and it would end up too limiting for anything bigger than a small operation that they exclusively control the hardware for, definitely nothing that deals in colocation. If you're a small DC that is exclusively populated by yourself though I can see this saving a lot of money.
ASHRAE is the "official" body that makes up the rules for it for at least some of the OEMs.
Though if a server is designed for this temperature it's probably fine. The spec sheet for most servers will state quite specifically what's allowed for each class, for example to allow 45C (Class A4) on our servers, we wouldn't be allowed any NVME drives, the CPU could only be a 155W TDP (which would keep the VRMs cooler), you wouldn't be allowed any network connection higher than 10GbE and you can pretty much install nothing else in the server. Way too limiting for me at least.
Maybe not relevant, but if your CPUs are still running hot is the paste any good still? PTM7950 performs great and lasts forever. If you're running something a few years old, the super cheap paste that the OEMs use could have dried up (and it was never great to start with). YMMV as the heat is way better spread out on EPYCs than Xeons, but we have 200W CPUs running loaded at ~60C with it, and that's with our higher DC ambient temp.
Hopefully it works out okay for you though
@Kodomu
Thermal Paste is fine. ( we change it too often I might say with each prolonged downtime we do)
CPU's are a combination of Scale Gen 2 but with 150W MAX, and 125W on the newer ones that are inbound.
There is a limit to witch I am willing to push these NODES and CPUs sitting around 80 °C is outside that comfort zone for me.
This fuckery turned out to be a mix of 3-4 things in the mix ( 2 of them being the new FW on IDRAC and BIOS ).
Happy we managed to figure this one out fast, the less fun part is that this means rebooting the remaining Xeon Scalable Gen 2 nodes to apply the fixes.
Yeeee the angry mob right now - I know, not funny, I just can't help myself.

We will schedule and announce involved parties in the upcoming days.
None of the other nodes experience these symptoms, this was isolated to one node only, but we wish to stay ahead of the problem.
Very odd even to sit at ~60C, I wouldn't have expected them to run that hot then especially with a 16c ambient. I'm not 100% familliar with new Dell but do they do the same as Lenovo and have different tiers of coolers? Lenovo has a "Standard" one for 155w max and a "High Performance" one for more than that and the same options for the installed fans. If you happened to be using the low performance cooler then maybe it's a cheap insurance to swap it out even if it's technically capable.
All coolers are performance, we order them like that.
Chasy is rated for " up to 200W / CPU " as I recall, but that is BS, as the heat-sink is that of the 1U R640, and the MB is extremely small and crowded compared to the 12 bay version ( ours are 24 to 26 models ) - but that is by "my standards" and experience.
We set even speed-offset on all nodes, just to be on the same side, that is standard with us.
The downside of this 24 bay models is the.... welll... 24 drives in front of the coolers. as the air that hit's the cpu is warm as it cools 2x 12 drives, and the newer 18-24TB ones are kinda warm while running.
Yet, we did not had any issues as you saw on the graphs till the FW upgrade. Aperantely in the new FW, we have to be a bit more specific on some settings, and it is related only to 1 CPU model, 6226, as the other ones running 6240, 6248 no issues, also the 6226 is a 125W CPU, the rest are 150W. So I will put this under the " weird FW bug".
@host_c
Sounds like it might be, Intel has generally always misrepresented their TDPs too in my experience. I didn't know dell was doing what Gigabyte was doing and putting 1U heatsinks in 2U servers either
Sounds like it's an R740xd2? I think they're just slightly fancy R540 boards which explains the weirdness.
Hopefully it works itself out for you.
@Kodomu
What I have learned in over 2 decades of working in IT, regardless of the brand in question, specs wise:
So there is little that can surprise me when it does not work as advertised.
Cheers!
Too many comments to read and know what is happening.
I have only one question: sue & deadpool or not?!
I got my answer.
SUE
@host_c can i have @JabJab servers once he sues you!? pleeeeeease?
servers!?
serverS?!
You think I have more than 1?!?!?!?!?!
I have one, one I was cheated on.
You can have it (if they allow transfers, but you pay feeeeeeeeeeeeeeeees).
Best i can do is $1 yr.

Reguards
Well, if he so wishes, we can make it happen, I really do not wish to "cheat" him.
I think that is on RAID10 if I recall???? @JabJab
I found SUE:
You were right, she is nice hot.
You mean not everyone collects them!? I want to add them all to my collection.
@AlteredParadox
DM me the Pioneers IP, I don't like IOWAIT!!! so I put one of the minions to check that node.
PS:
haha, sent!
Obligatory updated screenshot, because @host_c doesn't like iowait and fixes all the things
THX @AlteredParadox
I’m a bit sensitive when it comes to the new nodes as the formula is ~90% there, but from the other side of the screen it’s not always easy to see what issues users might be running into.
That’s exactly why I value this kind of input. If I notice something that looks off in postings by our internal standards, I may ask for a DM or a bit more clarification and to see whether there’s something we can improve or we did something wrong. ( AKA Messed up, as sometimes we do
)
Thanks again for taking the time to share your observations —
.
Also how much does this glorious 1c1gb thing cost!?
dear @host_c friend, what exciting things are happening now that the maintenance is done!? I am bored!
How about this?
@Freek
And there is one sent out the Nimitz 10TB fellas also running on E5-V4
Services are already modified to monthly with their corresponding value / month.
@AlteredParadox
I assume this will get out of control soon, so
Host-c how much do you think it would be to ship those e5 v4s to NZ?
depends, they are ~26 KG, I would say a lot.
, Rails alone are like 2KG
Why, need them? If you arrange shipping, I am more then happy to cut a deal on them.
Hmm, sorry I don't understand what you mean by that. My Pioneer 5TB has vanished and rebranded to this. Renewal is still set to Quarterly and due April 16th.
I will put STU to fix that for you.

TL;DR.
S U E