Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Soft raid and finding broken drive
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Soft raid and finding broken drive

Say that you have a dedicated server with drives configured in soft RAID, like Hetzner's. If a drive gets broken, how can support know which one to replace? Would they need access to the server itself?

Comments

  • Generally form my exp they will ask you for the serial of the failing drive.

  • Good question, I thought it was an automated approach for @Hetzner_OL.

  • NetDynamics24NetDynamics24 Member, Host Rep

    They will ask you for the serial number of the affected disk.

    Thanked by 2Erisa karjaj
  • Always a good idea to have current smartctl -i /dev/sdX for every drive in the server.

    Because if the drive fails, you probably won't get any SMART information from it.

    But you can also do it through the process of elimination.

    If you have a 4 drive RAID10 and one of the drive falls out, you can still get the SMART information for the other 3. So the person swapping out the drive would know it's the drive that doesn't match any of the 3 serial numbers provided.

  • PhantomPainPhantomPain Member
    edited April 2022

    @sparek said:
    Always a good idea to have current smartctl -i /dev/sdX for every drive in the server.

    Because if the drive fails, you probably won't get any SMART information from it.

    But you can also do it through the process of elimination.

    If you have a 4 drive RAID10 and one of the drive falls out, you can still get the SMART information for the other 3. So the person swapping out the drive would know it's the drive that doesn't match any of the 3 serial numbers provided.

    But how can you get noticed while one drive in raid 10 failed since the filesystem continue works like normal?

  • CatixsCatixs Member, Host Rep

    There’s actually a video on YouTube shot from the hetzner datacenter, mentioned they have some kind of special tools just to test the drives 24x7.

  • @PhantomPain said:
    But how can you get noticed while one drive in raid 10 failed since the filesystem continue works like normal?

    Write a script to periodically check

    cat /proc/mdstat

    If a drive drops, one of the U's will be replaced with a _

    Or run a script periodically that checks for changes:

    cat /proc/mdstat | grep 'blocks'

    Store a copy of that output into a file. Then when the next periodical check happens, compare the output of that check to what's in the file. If it's changed - then you may have a drive drop out. Probably not an EXACT science, but can give you enough of a notice to log into the server and check things out first hand.

    Thanked by 1ariq01
  • LittleCreekLittleCreek Member, Patron Provider

    Add your email address to /etc/mdadm.conf and it will email you when there is a failure.

    Thanked by 2craigb karjaj
  • AdvinAdvin Member, Patron Provider

    You can either:
    1. Say the serial number of the dead drive
    2. Say the serial numbers of the drives that are still working and they'll remove the one's that are not included in the list

Sign In or Register to comment.