New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Need tutorial on how to simulate Soft RAID failure
I've recently got a SYS (OVH) server that has 2 HDD SATA disks. I've installed Debian/Proxmox on it and it is configured as a RAID 1.
Since I didn't install anything important yet on this server, I would need a tutorial on how can I simulate a failure on 1 of the HDD disks (even formatting the "failed" one, just like a new HDD was installed to replace the failed one, and subsequent RAID rebuilding). I want to practice on unimportant data before I use it in production...
Comments
Impressive. Would love to see what experts here have to say
The easiest case is to forcibly unmount the disk, have it formatted, or do whatever it takes to mess up your partition. Then pretend it's a new disk, remount it, and start your array rebuild.
This is interesting, as this is my worst fear after OVH rebooting the server in rescue mode for an abuse claim.
echo 1 > /sys/block/sdb/device/delete
to make a disk vanishAdditionally, you'd probably want to know if the server is using UEFI or legacy boot and whether or not if the secondary drive is able to be booted from.
That's a common issue with software RAID setups. The second drive isn't set to be properly booted from, so if the first drive fails, then the server doesn't boot back up.
How does one check this?
So what happens next? What to do?
True...
Check for a /sys/firmware/efi directory on the system. If it exists, then you're using UEFI. If it doesn't, then you're probably using legacy BIOS boot.
If you're using UEFI then you will need to efibootmgr to check the order of booting AND insure that the second drive has a proper ESP with up to date grub settings.
If you're not using UEFI then you would want to insure that grub is install in the MBR of the second drive.
That's insightful. Thanks for the same. I'll check.
But BTW how do I make a disk go bad to endure server comes back? Like what op asked.
You need chaos monkey. https://github.com/Netflix/chaosmonkey this will get you stimulated and on the edge every time.
Had heard and kinda forgotten about this. Will check if it supports what we wanna do... thanks
@LTniger I need to know how to simulate a RAID disk failure and subsequent recovery/rebuild. Chaos Monkey is just a tool to randomly kill app services inside instances.
I also have such doubts, and I hope you can find a solution.
Any online course or something in RAID management?
How to best detect failures (and maybe if possible likely upcoming failures/bad drives)
How to understand what failed
How to make sure all drives that should be bootable are bootable
How to rebuild failed drives and make sure everything works