How to check if harddrive is broken?

WHT · February 2016

I have a sys server since 3 month and yesterday my sites started to be slow, sometimes works good sometimes takes 20 seconds to load.

I suspect that the disk in raid1 is broken. Any tools, commands I can check it? Or SYS techs do replace from itself?

Awmusic12635 · February 2016

smart data?

OnraHost_Zack · February 2016

Is it software RAID or do you have a raid card? There are utilities for both to check the status of the raid array, as well as each individual drive.

doghouch · February 2016

Is this your automatic assumption that because your website took 20 seconds to load that it's the raid card? Check the disk array - if you need help, just shoot me a PM. Also, model of the raid card please?

EDIT: If this isn't your server, you'll need the provider to help.

kt · February 2016

Assuming its SW RAID, whats output of:
cat /proc/mdstat

(If it shows UU then array is fine)

Also check smart:

smartctl -a /dev/sda

smartctl -a /dev/sdb

smartctl -a /dev/sdc (if you have 3rd drive)

WHT · February 2016

I have software raid. Will check those dev things tomorrow. Thanks

Spacedust · February 2016

Post the output on pastebin and I will check what's wrong.

vimalware · February 2016

So, I started an extended SMART test over an hour or so ago after seeing several email alerts from smartmontools

I got an email alert about 10% of the way through the extended self-test:

The following warning/error was logged by the smartd daemon:
Device: /dev/sdb [SAT], Self-Test Log error count increased from 0 to 1

So, I poked right away:

    smartctl -a /dev/sdb |grep fail
                                            the read element of the test failed.
      1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
      3 Spin_Up_Time            0x0027   253   246   021    Pre-fail  Always       -       6591
      5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
    # 1  Extended offline    Completed: read failure       90%     35073         648257133

Is this enough to ask for replacement?
(edit: this is a zfs raid1 . I've never received any email alerts for /dev/sda)

qps · February 2016

vimalware said: Is this enough to ask for replacement?

I would try to run the test again and see if you have the same result.

Spacedust · February 2016

This disk is way too old - replace it !

Kris · February 2016

Run a few long / short tests to show it's reoccurring, and fails at the same place, it'll have more clout.

smartctl --test=short /dev/sdb;
smartctl --test=long /dev/sdb;

You could also throw stress-ng at it, and see if the disks last through that, or develop sector errors after one.

My opinion is if a system lasts stress-ng --random 32 -t 24h , machine is production worthy.

Adjust the random workers depending on your machine, and disable / watch out for Watchdog being enabled in the BIOS.

Otherwise, Watchdog will restart the machine if you tax it too hard thinking it's not responding due to the load.

raindog308 · February 2016

Kris said: My opinion is if a system lasts stress-ng --random 32 -t 24h , machine is production worthy.

Only the strong shall survive! I like it.

BlazingServers · February 2016

Call OVH through Skype. They are helpful enough.

pbgben · February 2016

First, BACKUP! Or your bound to loose all your shit...

Kris · February 2016

Use 'watch' while rsyncing off the server to a temp. backup site on the smartctl and look for changes in the Reallocated Sectors, Seek errors etc.

I'd suggest a good dose of stress-ng, as you can tune it to hit the disk if you want.

You can always write out /dev/zero to a file the size on the disk you have left, and watch for smartctl for increasing errors, especially re-allocated sectors. They ain't unlimited.

Kris · February 2016

raindog308 said: Only the strong shall survive! I like it.

Few hundred days uptime on the beasts I built and stressed, would have been more if it weren't for those pesky XSAs. Still chugging along.

I would sit twenty minutes per CPU cleaning old residue off with with TSP, ArticClean, isopropyl alcohol and a fresh micro-fiber cloth.

Followed the same ritual on the heatsink, but also used melamine foam aka magic erasers to lay a slight key, like you would before repainting a vehicle. This ensured there was no chance of a layer of old thermal compound or grease, as they are essentially micro-polishers. Just tiny swirls making little circles. And since the magic eraser is white, you know exactly when it's clean of residual grease, and you can visually see the tiny scratches you make from well.. polishing the heatsink.

Once spotless, I would use Noctua NT-H1 or Ceramique 2. I preferred Ceramique as it never spread off the chip, although Noctua is great if you remember to not use too much.

Overkill? Sure. Only 1-2C differences from others, but every degree matters.

I would let them go for the weekend stressing in the office. Figured if the machine was responsive and still alive with no kernel panic, it was good for production.

I still prefer good 'ol Westmere workhorses, specifically any 5600 on a 1U chassis over a lot of the new blade crap I'm seeing.

karjaj · February 2016

@vimalware said:
So, I started an extended SMART test over an hour or so ago after seeing several email alerts from smartmontools
    # 1  Extended offline    Completed: read failure       90%     35073         648257133
Is this enough to ask for replacement?
(edit: this is a zfs raid1 . I've never received any email alerts for /dev/sda)

Yes it is. Open ticket from SYS-panel and copy/paste smart data to ticket.
Last week they replaced both disks to my SYS-IP-1 server. And what I can say, whole process was completed very professional way.
But remember: backups!
And that, they only change faulty disk, so you must check that bootloader is installed ok and you must re-partiton disk and sync raid etc.
They don't do that.

Howdy, Stranger!

Categories

In this Discussion

How to check if harddrive is broken?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

How to check if harddrive is broken?

Comments