New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
How can I find out the broken HDD from the physical server bay if server it's on software raid/mdadm
Hello LET, we are facing a rather stressful problem, those who have experience with raid / mdadm software, if a HDD breaks, can they identify it in real life? To remove it from the server bay
The HDD can no longer be seen in lsblk, dmseg or others, what would be your solution?
Until now, we marked each bay with labels, but sometimes it failed to identify them so we start search more better ideeas
So any better idea?
Regards,
Calin
Comments
Maybe @PulsedMedia have any idea?
Good luck Calin.. I hope the issues are fixed soon
>
Thanks we fixed from long time ago , we just search a more easy method to identify broken HDDs
Regards
Every HDD/SSD has SN already printed on front label.
If HDD dissapeared, easily check which phisycal HDD does not have one of the following SNs received as output:
lsblk --nodeps -o name,serial
Example:
Label them with serial numbers and compare the serial numbers of the disks that are showing with the physical ones.
That's easy. Just shake them a little. Voila, if it falls apart it's broken!
blink them one by one in turn.
whichever not blinking could be broken
https://linux.die.net/man/8/ledctl - turn on all of the drives UID lights that you can see in lsblk/frisk and the one that isn’t lit up is the one to remove.
@Calin
Bro, I can only say.... you know what I will say......
This time, I cannot help, as mdadm is above my pay-grade.
I suspect the disk poping out of the raid cannot keep in synk with the others.
That does not mean it is dead, sw raid is sensitive to latencies, as the cpu does the storage also + parity calculations + other
In high IO ( 20+ disks as you have ) these things might happen.
EDIT:
ZFS will be more stable, but you have to test that out....
Hey @Calin! Sometimes software RAID configurations default to sending root an email when a problem is detected. If outbound email has not also been configured, sometimes the email is dumped as a text file into the /root directory. So you could take a peek in your /root directory. You might find a helpful email. It happened to me once, a long time ago. Best wishes! Tom
Hi there!
Just confirming if the server management software (IDRAC/ILO) is reporting a faulted drive? I know there you can have the drive bay blink if your system supports that feature.
On my regular tower case I have a sticky label on the rear side of each disk, because the label on top is not readable with the disks mounted..
I think he will not have that, as the ILO does not read the SMART from drives on the HBA on HP Apollo systems. ( as I recall )
Difference in activity. Read to /dev/null from the rest and the leds will be different.
Some Dell servers, esp if using HW Raid can also show the leds.
Some DAS chassis' have an esoteric way to change the leds too.
There's also ways to check from cli which drive bay it is.
Difficulties start when you start to hit 40, 80 .... 120 drives on single system
Starting working hard and disable drives one by one to test the broken one.