Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Need tutorial on how to simulate Soft RAID failure
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Need tutorial on how to simulate Soft RAID failure

ccaritaccarita Member
edited July 2023 in Help

I've recently got a SYS (OVH) server that has 2 HDD SATA disks. I've installed Debian/Proxmox on it and it is configured as a RAID 1.
Since I didn't install anything important yet on this server, I would need a tutorial on how can I simulate a failure on 1 of the HDD disks (even formatting the "failed" one, just like a new HDD was installed to replace the failed one, and subsequent RAID rebuilding). I want to practice on unimportant data before I use it in production...

Thanked by 2dodheimsgard emgh

Comments

  • Impressive. Would love to see what experts here have to say

    Thanked by 1emgh
  • The easiest case is to forcibly unmount the disk, have it formatted, or do whatever it takes to mess up your partition. Then pretend it's a new disk, remount it, and start your array rebuild.

    Thanked by 1emgh
  • JamesFJamesF Member, Host Rep

    This is interesting, as this is my worst fear after OVH rebooting the server in rescue mode for an abuse claim.

    Thanked by 1emgh
  • echo 1 > /sys/block/sdb/device/delete to make a disk vanish

  • spareksparek Member

    Additionally, you'd probably want to know if the server is using UEFI or legacy boot and whether or not if the secondary drive is able to be booted from.

    That's a common issue with software RAID setups. The second drive isn't set to be properly booted from, so if the first drive fails, then the server doesn't boot back up.

  • @sparek said:
    Additionally, you'd probably want to know if the server is using UEFI or legacy boot and whether or not if the secondary drive is able to be booted from.

    That's a common issue with software RAID setups. The second drive isn't set to be properly booted from, so if the first drive fails, then the server doesn't boot back up.

    How does one check this?

  • @darkimmortal said:
    echo 1 > /sys/block/sdb/device/delete to make a disk vanish

    So what happens next? What to do?

  • @JamesF said:
    This is interesting, as this is my worst fear after OVH rebooting the server in rescue mode for an abuse claim.

    True...

  • spareksparek Member

    @plumberg said:

    @sparek said:
    Additionally, you'd probably want to know if the server is using UEFI or legacy boot and whether or not if the secondary drive is able to be booted from.

    That's a common issue with software RAID setups. The second drive isn't set to be properly booted from, so if the first drive fails, then the server doesn't boot back up.

    How does one check this?

    Check for a /sys/firmware/efi directory on the system. If it exists, then you're using UEFI. If it doesn't, then you're probably using legacy BIOS boot.

    If you're using UEFI then you will need to efibootmgr to check the order of booting AND insure that the second drive has a proper ESP with up to date grub settings.

    If you're not using UEFI then you would want to insure that grub is install in the MBR of the second drive.

    Thanked by 1plumberg
  • @sparek said:

    @plumberg said:

    @sparek said:
    Additionally, you'd probably want to know if the server is using UEFI or legacy boot and whether or not if the secondary drive is able to be booted from.

    That's a common issue with software RAID setups. The second drive isn't set to be properly booted from, so if the first drive fails, then the server doesn't boot back up.

    How does one check this?

    Check for a /sys/firmware/efi directory on the system. If it exists, then you're using UEFI. If it doesn't, then you're probably using legacy BIOS boot.

    If you're using UEFI then you will need to efibootmgr to check the order of booting AND insure that the second drive has a proper ESP with up to date grub settings.

    If you're not using UEFI then you would want to insure that grub is install in the MBR of the second drive.

    That's insightful. Thanks for the same. I'll check.

    But BTW how do I make a disk go bad to endure server comes back? Like what op asked.

  • LeviLevi Member

    You need chaos monkey. https://github.com/Netflix/chaosmonkey this will get you stimulated and on the edge every time.

    Thanked by 1plumberg
  • @LTniger said:
    You need chaos monkey. https://github.com/Netflix/chaosmonkey this will get you stimulated and on the edge every time.

    Had heard and kinda forgotten about this. Will check if it supports what we wanna do... thanks

  • @LTniger I need to know how to simulate a RAID disk failure and subsequent recovery/rebuild. Chaos Monkey is just a tool to randomly kill app services inside instances.

  • I also have such doubts, and I hope you can find a solution.

  • emghemgh Member

    Any online course or something in RAID management?

    How to best detect failures (and maybe if possible likely upcoming failures/bad drives)

    How to understand what failed

    How to make sure all drives that should be bootable are bootable

    How to rebuild failed drives and make sure everything works

Sign In or Register to comment.