Soft raid and finding broken drive

vitobotta · April 2022

Say that you have a dedicated server with drives configured in soft RAID, like Hetzner's. If a drive gets broken, how can support know which one to replace? Would they need access to the server itself?

jugganuts · April 2022

Generally form my exp they will ask you for the serial of the failing drive.

MrLime · April 2022

Good question, I thought it was an automated approach for @Hetzner_OL.

NetDynamics24 · April 2022

They will ask you for the serial number of the affected disk.

sparek · April 2022

Always a good idea to have current smartctl -i /dev/sdX for every drive in the server.

Because if the drive fails, you probably won't get any SMART information from it.

But you can also do it through the process of elimination.

If you have a 4 drive RAID10 and one of the drive falls out, you can still get the SMART information for the other 3. So the person swapping out the drive would know it's the drive that doesn't match any of the 3 serial numbers provided.

PhantomPain · April 2022

@sparek said:
Always a good idea to have current smartctl -i /dev/sdX for every drive in the server.

Because if the drive fails, you probably won't get any SMART information from it.

But you can also do it through the process of elimination.

If you have a 4 drive RAID10 and one of the drive falls out, you can still get the SMART information for the other 3. So the person swapping out the drive would know it's the drive that doesn't match any of the 3 serial numbers provided.

But how can you get noticed while one drive in raid 10 failed since the filesystem continue works like normal?

Catixs · April 2022

There’s actually a video on YouTube shot from the hetzner datacenter, mentioned they have some kind of special tools just to test the drives 24x7.

sparek · April 2022

@PhantomPain said:
But how can you get noticed while one drive in raid 10 failed since the filesystem continue works like normal?

Write a script to periodically check

cat /proc/mdstat

If a drive drops, one of the U's will be replaced with a _

Or run a script periodically that checks for changes:

cat /proc/mdstat | grep 'blocks'

Store a copy of that output into a file. Then when the next periodical check happens, compare the output of that check to what's in the file. If it's changed - then you may have a drive drop out. Probably not an EXACT science, but can give you enough of a notice to log into the server and check things out first hand.

LittleCreek · April 2022

Add your email address to /etc/mdadm.conf and it will email you when there is a failure.

Advin · April 2022

You can either:
1. Say the serial number of the dead drive
2. Say the serial numbers of the drives that are still working and they'll remove the one's that are not included in the list

Howdy, Stranger!

Categories

In this Discussion

Soft raid and finding broken drive

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Soft raid and finding broken drive

Comments