Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


How does the provider know which drive to replace is failing software RAID?
New on LowEndTalk? Please Register and read our Community Rules.

How does the provider know which drive to replace is failing software RAID?

If I have software RAID in a dedicated server, and a drive is failing, how can I or the provider know which one needs to be replaced? Thanks.

Comments

  • S.M.A.R.T utils. If you have 1720 error - hard drive is about to fail.

  • yoursunnyyoursunny Member, IPv6 Advocate

    Each hard drive slot has a green light and a yellow light.
    The green light means the drive is inserted.
    The yellow light means the drive has failed.
    You should replace the drive in the spot with yellow light.

  • Is there a way of getting email alerts if a drive in the array is failing so that I know when to contact support? Because otherwise in a RAID 1 how would I notice if a drive is failing?

  • letloverletlover Member

    @vitobotta said:
    Is there a way of getting email alerts if a drive in the array is failing so that I know when to contact support? Because otherwise in a RAID 1 how would I notice if a drive is failing?

    smartmontools, smartctl, as someone just answered my question in another post.

    Thanked by 1mrTom
  • jackbjackb Member, Host Rep
    edited July 23

    @vitobotta said:
    If I have software RAID in a dedicated server, and a drive is failing, how can I or the provider know which one needs to be replaced? Thanks.

    You should be able to check the health of the array e.g. with mdadm, in /proc/mdstat which will show failed drives. You need to remove the failed drive from the array before having it replaced.

    You need to send the serial number of the drive to be replaced to your provider. If you can't get it (e.g. drive too dead to identify itself), you should send the serials of the healthy drives.

    There are often health indicator lights though these may or may not be in use depending on configuration - so assume they aren't and send serials.

  • edited July 23

    @vitobotta said:
    If I have software RAID in a dedicated server, and a drive is failing, how can I or the provider know which one needs to be replaced? Thanks.

    Depends. Some servers with built-in hardware RAID will have lights for each drive, or maybe even on each drive if they are hot-swap, so they can see directly which one is failing as the controller will tell the drive to flash its “I'm dying” sequence.

    With software RAID you possibly know of the problem through SMART throwing you a warning, from a drive dropping into a failed state in /proc/mdstat (and possibly a mail alert telling you this has happened), or from errors logged elsewhere. In the case of SMART the warning messages should include the serial number of the drive that is reporting issues. Or if you know the drive device name from elsewhere you can use smartctl or other tools to read the serial number of that drive and let the provider know. If they have sufficient access to your server, for instance if you have a managed server deal, they can check this themselves directly.

    For hardware RAID without visible indicators on the physical machine, local hands-on support can tell by restarting the machine into the RAID BIOS or equivalent and get the details that way. You can probably tell using whatever monitoring software comes with your hardware RAID controller too, though how you read that will depend on the controller and its specific tools.

    Once they know the serial number they can pull that drive, as it will also be included on the drive's label(s).

  • Thanks, this is basically what I was looking for. I configured mdadm to send me email alerts when the status of the RAID changes. This is good enough for now :)

  • imgmoneyimgmoney Member

    You can also use hetrixtools to monitor the health status of the drive and replace it before it fails.

  • dustincdustinc Member, Patron Provider, Top Host

    Beyond checking with SMART, for software RAID environments, the command: cat /proc/mdstat will identify to you which drive is active, from there you can identify which drive is dead.

    If for example you have 2x drives in a Software RAID-1, cat /proc/mdstat would return "2/2" which would imply that both drives are online and functioning. If it returns 1/2 - it would mean that only one drive is active, and that the missing/dead drive would need to be looked at further.

    Thanked by 1NetDynamics24
Sign In or Register to comment.