Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Amd ryzen 7950x random reboot
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Amd ryzen 7950x random reboot

Has anyone experienced this problem?
The ryzen 7950x server randomly reboots without any errors
System AlmaLinux 8.8
Kernel 6.4.8-1.el8.elrepo.x86_64

Comments

  • HostEONSHostEONS Member, Patron Provider
    edited August 2023

    Been using 5950x since some time, never had issues, also does it reboots or just freezes? if it freezes, make sure to check it's not related to NVME

    Do check IPMI logs, may be power issue or loose power cable ?

    Also try to use the default 4.x kernel that comes with Almalinux and see if the issue persists or not

  • I've seen this topic but nothing much has been solved there, yes I added this but it didn't help me GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0 nomodeset noapic pci=assign-busses apicmaintimer idle=poll reboot=cold,hard".

  • @HostEONS said:
    Been using 5950x since some time, never had issues, also does it reboots or just freezes? if it freezes, make sure to check it's not related to NVME

    Do check IPMI logs, may be power issue or loose power cable ?

    Also try to use the default 4.x kernel that comes with Almalinux and see if the issue persists or not

    Premium water cooling is used and it also didn't help, no network issues at all

  • @Dessgun said:

    I've seen this topic but nothing much has been solved there, yes I added this but it didn't help me GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0 nomodeset noapic pci=assign-busses apicmaintimer idle=poll reboot=cold,hard".

    Howe you updated grub afterwards?

  • DessgunDessgun Member
    edited August 2023

    @CalmDown said:

    @Dessgun said:

    I've seen this topic but nothing much has been solved there, yes I added this but it didn't help me GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0 nomodeset noapic pci=assign-busses apicmaintimer idle=poll reboot=cold,hard".

    Howe you updated grub afterwards?

    grub2-mkconfig -o /boot/grub2/grub.cfg
    reboot

    GRUB_TIMEOUT=5
    GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
    GRUB_DEFAULT=saved
    GRUB_DISABLE_SUBMENU=true
    GRUB_TERMINAL_OUTPUT="console"
    GRUB_CMDLINE_LINUX=GRUB_CMDLINE_LINUX="crashkernel=auto resume=/dev/mapper/almalinux-swap rd.lvm.lv=almalinux/root rd.lvm.lv=almalinux/swap rhgb quiet consoleblank=0 nomodeset noapi$
    GRUB_DISABLE_RECOVERY="true"
    GRUB_ENABLE_BLSCFG=true

  • DessgunDessgun Member
    edited August 2023

    @CalmDown said:

    @Dessgun said:

    I've seen this topic but nothing much has been solved there, yes I added this but it didn't help me GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0 nomodeset noapic pci=assign-busses apicmaintimer idle=poll reboot=cold,hard".

    Howe you updated grub afterwards?

    Oh, it seems chat gpt didn't help me much and I've now realised I've spelt it wrong, can anyone advise here's what I originally had
    GRUB_CMDLINE_LINUX="crashkernel=auto resume=/dev/mapper/almalinux-swap rd.lvm.lv=almalinux/root rd.lvm.lv=almalinux/swap rhgb quiet"

    How do I add these lines correctly?
    GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0 nomodeset noapic pci=assign-busses apicmaintimer idle=poll reboot=cold,hard"

    Am I right in thinking it should be like this?
    GRUB_CMDLINE_LINUX="consoleblank=0 nomodeset noapic pci=assign-busses apicmaintimer idle=poll reboot=cold,hard=crashkernel=auto resume=/dev/mapper/almalinux-swap rd.lvm.lv=almalinux/root rd.lvm.lv=almalinux/swap rhgb quiet"

  • vingohostvingohost Member, Host Rep

    I would suggest doing these tests to diagnose hardware issues
    https://help.ovhcloud.com/csm/en-dedicated-servers-hardware-diagnostics?id=kb_article_view&sysparm_article=KB0043506
    Maybe start with the Memory test. It's from my experience one of the most common causes of random reboot.

    Thanked by 2NetDynamics24 tentor
  • @HostEONS said:
    if it freezes, make sure to check it's not related to NVME

    My bet is on the nvme as well.

  • Do your memories have a heat sink? If so, they also require airflow. ECC memories would save you a week of diagnosis work here.

  • DataWagonDataWagon Member, Patron Provider

    Is it Gigabyte motherboard? If so, it's a known issue. Nothing will really help, it's random.

  • DataWagonDataWagon Member, Patron Provider

    @davide said:
    Do your memories have a heat sink? If so, they also require airflow. ECC memories would save you a week of diagnosis work here.

    It's highly unlikely that the issue is related to memory errors.

  • iv had a similar issue but on a different cpu / motherboard but it had something to do with the psu where it would loose power but boot back up since it was so quick and it seemed like it just 'rebooted' but it was really it just loosing power completely but for a very brief amount of time.

  • @DataWagon said:
    It's highly unlikely that the issue is related to memory errors.

    Why?

  • I would think it would have something to do with hardware if there are no relevant logs leading up to the reboot

  • DataWagonDataWagon Member, Patron Provider

    @davide said:

    @DataWagon said:
    It's highly unlikely that the issue is related to memory errors.

    Why?

    Just personal experience. We've built hundreds of Ryzen machine, AM4 and AM5 on the server boards with non ECC RAM. We've never had bad memory cause an issue. Bad motherboards on these Gigabyte / Asrock Rack builds are very common and often cause issues exactly like what OP is describing.

  • DessgunDessgun Member
    edited August 2023

    The problem was related to CPU virtualisation, one hosting gave us a command that solved this problem, but unfortunately the hoster asked us not to publish this solution to the public, oh that competition :'(
    Actually it's all AMD's fault, how could he not think that people will make virtual machines on this processor?

  • MikeAMikeA Member, Patron Provider

    @Dessgun said:
    The problem was related to CPU virtualisation, one hosting gave us a command that solved this problem, but unfortunately the hoster asked us not to publish this solution to the public, oh that competition :'(
    Actually it's all AMD's fault, how could he not think that people will make virtual machines on this processor?

    Likely not competition, I'd guess it's an issue on their side if they don't want it published.

    I and many others run lots of virtual machines on AMD Ryzen, never a problem.

    Thanked by 1crunchbits
  • @MikeA said:

    @Dessgun said:
    The problem was related to CPU virtualisation, one hosting gave us a command that solved this problem, but unfortunately the hoster asked us not to publish this solution to the public, oh that competition :'(
    Actually it's all AMD's fault, how could he not think that people will make virtual machines on this processor?

    Likely not competition, I'd guess it's an issue on their side if they don't want it published.

    I and many others run lots of virtual machines on AMD Ryzen, never a problem.

    This is a problem with the latest ryzen series and it is very much prevalent, if you raise kvm machines on a dedicated server from Hetzner on and ryzen 7950x they will crash

  • MikeAMikeA Member, Patron Provider

    @Dessgun said:

    @MikeA said:

    @Dessgun said:
    The problem was related to CPU virtualisation, one hosting gave us a command that solved this problem, but unfortunately the hoster asked us not to publish this solution to the public, oh that competition :'(
    Actually it's all AMD's fault, how could he not think that people will make virtual machines on this processor?

    Likely not competition, I'd guess it's an issue on their side if they don't want it published.

    I and many others run lots of virtual machines on AMD Ryzen, never a problem.

    This is a problem with the latest ryzen series and it is very much prevalent, if you raise kvm machines on a dedicated server from Hetzner on and ryzen 7950x they will crash

    Weird, never had a problem, but I don't run anything on Hetzner.

  • labzelabze Member, Patron Provider

    @Dessgun said:

    @MikeA said:

    @Dessgun said:
    The problem was related to CPU virtualisation, one hosting gave us a command that solved this problem, but unfortunately the hoster asked us not to publish this solution to the public, oh that competition :'(
    Actually it's all AMD's fault, how could he not think that people will make virtual machines on this processor?

    Likely not competition, I'd guess it's an issue on their side if they don't want it published.

    I and many others run lots of virtual machines on AMD Ryzen, never a problem.

    This is a problem with the latest ryzen series and it is very much prevalent, if you raise kvm machines on a dedicated server from Hetzner on and ryzen 7950x they will crash

    I have 3 7950XD servers from Hetzner running KVM VPS and they haven't crashed a single time.

  • @Dessgun is the crash related to nested virtualization?

  • DessgunDessgun Member
    edited August 2023

    @davide said:
    @Dessgun is the crash related to nested virtualization?

    Yes, partially

  • I have 7950x3d. I also experience host random reboots mostly when I play certain games in the VM. It only happens if nested virtualization is enabled and Win11 guest enables virtualization-based security. If I disable nested virtualization the host is stable.
    Motherboard is ASUS TUF X670E-Plus and 2x32GB ECC ram.

Sign In or Register to comment.