Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


HDD goes failing?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

HDD goes failing?

AmfyAmfy Member
edited February 2012 in Help

Hello,

today I noticed after updating my small dedicated server, that it has a extreme high load. And has a big iowait.
And anything what I am reading or writing to the harddisk(s) is really slow. It has two 250GB HDDs in RAID1 (software) and is CentOS6 - how can I check if one hdd is going to fail or is already failed?

dd if=/dev/zero of=test bs=64k count=4k conv=fdatasync
4096+0 Datensätze ein
4096+0 Datensätze aus
268435456 Bytes (268 MB) kopiert, 41,3819 s, 6,5 MB/s

Looks not good, and on the server are mostly only me...

EDIT:
Have done
smartctl -t long /dev/sda
smartctl -t long /dev/sdb

Now waiting an hour to look what smart will say... :/

Comments

  • mdadm should be able to tell you if the drive is failing. mdadm --detail /dev/mdXX, where mdXX is your software raid device.

    https://raid.wiki.kernel.org/articles/d/e/t/Detecting,_querying_and_testing.html will help, too.

  • AmfyAmfy Member
    edited February 2012

    mdadm --detail /dev/md0

    /dev/md0:
    Version : 1.0
    Creation Time : Fri Feb 17 21:52:56 2012
    Raid Level : raid1
    Array Size : 240639864 (229.49 GiB 246.42 GB)
    Used Dev Size : 240639864 (229.49 GiB 246.42 GB)
    Raid Devices : 2
    Total Devices : 2
    Persistence : Superblock is persistent
    Intent Bitmap : Internal
    Update Time : Tue Feb 28 17:29:06 2012
    State : active
    Active Devices : 2
    Working Devices : 2
    Failed Devices : 0
    Spare Devices : 0
    Name : atom:0 (local to host atom)
    UUID : f474eb1c:1c95cdeb:0d4a1e6e:39a2dd4c
    Events : 7800
    Number Major Minor RaidDevice State
    0 8 1 0 active sync /dev/sda1
    1 8 17 1 active sync /dev/sdb1

    But if I can trust that, everything seems be fine? But why I only get ~6mb/s? Some days/weeks ago, I got more than 40mb/s!

  • Perhaps there's some runaway process that's eating available i/o? You can install iotop to see: http://guichaz.free.fr/iotop/

  • @Damian: Thanks for the tip, but I have already checked everything with iotop. When I made the test it was only dd which wrote to the disc and that was not over ~6mb/s

  • rds100rds100 Member
    edited February 2012

    nevermind, deleting

  • @Amfy Take your server offline for an FSCK - maybe there are some corruption issues.

  • @sturdyvps Some weeks ago I made the mistake on a testing server, that I ran FSCK while the hdd was online and in use, and after that I had to reinstall the server...

    To force a FSCK-ceck I should do touch /forcefsck? And then reboot the server, wait a hour or some? And then it should be back. But I can't loose my data?

  • @Amfy Its not recommended to run FSCK while the Server is online.

    You can just issue the following command: shutdown -rF now (this command will reboot the server and do a fsck)

    I would recommend you also have access to a KVM or get your hosting company to do this for you.

    Thanked by 1Amfy
  • @sturdyvps: Hehe, yes, after this experience I really know that :P

    shutdown -rF now (this command will reboot the server and do a fsck)

    Ah, thanks!

    I would recommend you also have access to a KVM or get your hosting company to do this for you.

    I'm not sure... I had already access to a free-kvm a week ago, because of a faile at the installation of centos6 (The installer hasn't installed grub, even if I had forced the installer to do that). I don't want to go on the nervse to ask them, and maybe I will forced to pay ~25€ :/

  • If your Server is Managed then they should be able to do the fsck for you, even if the Server us Unmanaged they should be able to do the fsck for you again.

  • They're to friendly I will try it on my own.

    Thanks for all your help, @sturdyvps and @Damian :)

    @sturdyvps you're offering vps in NL? I will take a look at them :P

    Mh, okay, I decided to reboot my server this night and let it make a FSCK-Check. How long could it take about for a 250GB HDD?

  • @Amfy said: How long could it take about for a 250GB HDD?

    Well, if your drives are slow for some reason, then it could take quite some time. If rebooting returns your drives to their normal speed, then it will not be as long.

  • Mh, it takes not more than a normal reboot...

    I'm not sure, it seems not faster, but what could it be now? Or is ext4 oversized for a Intel D525?

  • dd if=/dev/zero of=test bs=16k count=16k conv=fdatasync
    16384+0 Datensätze ein
    16384+0 Datensätze aus
    268435456 Bytes (268 MB) kopiert, 94,2323 s, 2,8 MB/s

    looks awful

  • MaouniqueMaounique Host Rep, Veteran

    It is awful. You sure want software raid 1 on that ? I had countless problems with software raid including misterious fails like yours.
    Remove one disk and rebuild, this may solve the problem, but in the long run, you might consider other ways of securing your data, such as keeping your second drive for automated incremental backups with a minimal system installed in case the other drive fails.
    M

    Thanked by 1Amfy
  • What load average does the server have? Also the iowait how much is it?

    And also what are you using the server for?

  • AmfyAmfy Member
    edited February 2012

    @Maounique thank you very very much for your answer.

    And you really had similar problems with similar bad-speed? Some weeks ago I had on the same machine an debian with ext3 and it was relative good, then it seemed that one harddisk failed, I better reinstalled it. Because I wanted to test centos6 I have taken it and thought it would be more future if I take ext4.

    Actually I'm doing every night a vzdump backup to a hostigation-vps.

    Hmm, maybe I will contact the provider and ask him, if other customers are having similar problems.

    I don't have very important data, but I don't know a server without raid?!

    Remove one disk and rebuild, this may solve the problem

    You mean something like mdadm --manage /dev/md0 --remove /dev/sdb
    mdadm --manage /dev/md0 --add /dev/sdb?

    What mdadm --grow --bitmap=internal /dev/md0 help a bit?

  • In some cases software RAID will perform just as well as hardware RAID if not better. Modern CPU's along with modern software RAID don't really have an impact when using RAID 1, 0 or 10. I'd probably only opt for hardware RAID with a BBU. (Which we do run).

    However, I've worked with hardware RAID and software RAID and there's not much of a difference for what was stated above.

  • @sturdyvps:

    And also what are you using the server for?

    Splittet in OpenVZ-Containers: Mailserver, Shellserver, Webserver, some developer and testing stuff

    What load average does the server have?

    Acutally ~0,5 - 1,0. (I deactivated anything except the webserver). But at launch-time when I made some updates the load was about 10 - 20! And if I let dd write a minute or so, I have a load ~5

    If a proccess is waking up, the iowait % of top is growing near 100%

  • MaouniqueMaounique Host Rep, Veteran
    edited February 2012

    something like:
    mdadm /dev/md0 -f /dev/sda1 /marks as faulty
    mdadm /dev/md0 -r /dev/sda1 /removes
    mdadm /dev/md0 -a /dev/sda1 /adds back as hot spare
    Or sdb if you prefer.
    If you do not depend on that data, dont use raid, will lower power used and will reduce headache. In theory should work great, but in practice (mine) if data is not that important, backup is better than raid (even if you have hardware), if it is critical, use best performance with minimal redundancy+heavy backup.
    M
    P.S. It may look like one drive fails, but it is not always the case, software raids may be mistaken in some situations. Better check the drive before discarding.

    Thanked by 1Amfy
Sign In or Register to comment.