Should I ask for a new drive in my hetzner slot?

Redondit0 · May 2019

Hi.
Lately I was having several problems in my server hetzner (maybe they are due to my inexperience and are not related to this) so I started to test the system and I found this SMART result:

https://pastebin.com/r73sJzaQ

https://pastebin.com/FVfAYJpi

Not being able to understand how bad is what the test shows me, I consult here if, since I plan to do a clean installation of debian due to the dependency errors that I can not solve, I should take advantage and request the change of drive or Should I leave it the way it is?
Thanks!

deank · May 2019

Your drive looks fine to me.

But, sure, ask for a replacement. What have you got to lose? A "No" is the only bad result.
Be careful what you wish for though cuz there is a chance that you will get an older drive.

Redondit0 · May 2019

@deank said:
Your drive looks fine to me.

But, sure, ask for a replacement. What have you got to lose? A "No" is the only bad result.
Be careful what you wish for though cuz there is a chance that you will get an older drive.

Thanks for answering.
What do those 2151 errors that SMART detect mean?

Yura · May 2019

@deank said:
But, sure, ask for a replacement. What have you got to lose? A "No" is the only bad result.

also

@deank said: there is a chance that you will get an older drive.

deank · May 2019

It could simply mean how many errors it had encountered which can happen due to various factors.

@Redondit0 said:
Thanks for answering.
What do those 2151 errors that SMART detect mean?

willie · May 2019

They typically won't replace drives unless they are really failing or failed.

Neoon · May 2019

@willie said:
They typically won't replace drives unless they are really failing or failed.

You can replace all hardware anytime at Hetzner for a fee.
But if the Hardware is clearly fucked, it gets replaced for free.

Well, Reallocated_Sector_Ct is at 0, so Hetzner wont replace it most likely.
If you get bad sectors on a disk, they will replace it surely.

TheLinuxBug · May 2019

@Neoon said:

@willie said:
They typically won't replace drives unless they are really failing or failed.

You can replace all hardware anytime at Hetzner for a fee.
But if the Hardware is clearly fucked, it gets replaced for free.

Well, Reallocated_Sector_Ct is at 0, so Hetzner wont replace it most likely.
If you get bad sectors on a disk, they will replace it surely.

Um if it can't past a short offline test, I think it can be replace for free... that has 60k hours and is clearly showing read errors... I would be making a back-up and asking for a replacement.

The end is nigh!

deank · May 2019

I will take 7 year-old HDD over a brand new HDD.

Errors can and will happen due to various factors. I wouldn't care for the error count.

willie · May 2019

Oh ok if it's failing self test then that's not good. Yeah open a ticket.

Neoon · May 2019

@TheLinuxBug said:
Um if it can't past a short offline test, I think it can be replace for free... that has 60k hours and is clearly showing read errors... I would be making a back-up and asking for a replacement.

Of course if the machines dies, 60k is a decent age, nothing wrong about.
But yea, some values look a bit suspicious if you look at them closer..

rm_ · May 2019

deank said: Your drive looks fine to me.

No

deank said: It could simply mean how many errors it had encountered which can happen due to various factors.

No

Reported_Uncorrect (187) is a very bad symptom, e.g. Backblaze replaces the disk immediately in their DC if this goes above zero.

It's not some abstract "errors" due to "various factors", it's when you asked the disk for data, and it couldn't read data it stored from the platters anymore. Irrecoverable data loss. You're really fine with a disk doing that?

deank · May 2019

Probably because the disks on my storage rig has higher numbers.

Redondit0 · May 2019

Thanks for the answers.
Tonight I'll run a long test and leave it running until tomorrow to see if anything else comes out.
Most likely ask for a disk change, total, as deank said, the worst thing I can get is a NO, because if they give me a disk with more hours, but without those errors I think I win.

deank · May 2019

Indeed, nothing to lose by asking but all disks will have errors. That has been my experience. Considering the hours on yours (60k) and the amount of read error (2k), I'd say it's actually pretty good.
I've seen far worse cases.

TheLinuxBug · May 2019

@deank said:
Indeed, nothing to lose by asking but all disks will have errors. That has been my experience. Considering the hours on yours (60k) and the amount of read error (2k), I'd say it's actually pretty good.
I've seen far worse cases.

You have some weird standards for what you think is safe for disks, even in a raid.
If I see:
A. Uncorrectable CRC error (Reported Uncorrected) > 1
B. Failed short offline read test
C. Failed long offline read test (Especially)
D. UDMA_CRC_Error_Count > 1
E. Current_Pending_Sector > 1

Then I am changing the disk in my own array ASAP.

Raw_Read_Error_Rate on Seagate drives, however, can generally be ignored because they often use this value for diagnostics and it will randomly change to different numbers, even between times running 'smartctl'.

If you see that on a Hitachi drive though, then its starting to die.

I also prefer to run Hitachi Enterprise drives over anything Seagate because of the weird crap their SMART will report (such as Raw_Read_Error_Rate) and generally longer lifetime (generally live to be about 70-80k hours or sometimes more with regular wear).

My 2 cents.

Cheers!

levnode · May 2019

Neoon said: You can replace all hardware anytime at Hetzner for a fee. But if the Hardware is clearly fucked, it gets replaced for free.

Well, Reallocated_Sector_Ct is at 0, so Hetzner wont replace it most likely. If you get bad sectors on a disk, they will replace it surely.

Reallocated_Sector_Ct does not always mean the disk is OK. Reallocated_Sector_Ct > 0 does not mean the disk is bad. This is the HDD, there may be some issue with the head.

In my point of view, this disk is seriously damaged and I am pretty sure that Hetzner will replace it for you for free.

rm_ · May 2019

TheLinuxBug said: UDMA_CRC_Error_Count > 1

These can be fine and just show the disk had a bad cable connection some time in the past.

levnode said: Reallocated_Sector_Ct > 0 does not mean the disk is bad.

Yes, if there's just a few and they do not increase. But that's for a drive you own, in a DC setting there's arguably little reason to tolerate even that, if the DC is known to accept that as grounds for replacement.

Hetzner_OL · May 2019

willie said: They typically won't replace drives unless they are really failing or failed.
Neoon said: You can replace all hardware anytime at Hetzner for a fee.

Hi everyone, it's true. We exchange hardware components if the hardware is broken or doesn't perform well. Customers can request this in a support ticket. They should have log files available from test results, because our technicians may insist on log files as proof, especially when the hardware in question is a drive. Before we replace the hardware, our technicians ask the customer whether he would like to have a new drive with up to max. 1000 hours of running time. However, this option is only available in exchange of a fee.

Redondit0 said: Tonight I'll run a long test and leave it running until tomorrow to see if anything else comes out.

Redondit0 said: Most likely ask for a disk change, total, as deank said, the worst thing I can get is a NO, because if they give me a disk with more hours, but without those errors I think I win.

Make sure you send the results of your long test together with your hardware change request. Our technicians will then decide if they will replace your drive. I hope I could clarify our disk change process if you have further questions, let me know.
--Julia, Marketing

deank · May 2019

What happened to Katie?

Redondit0 · May 2019

The long test was already done, but now I have a question.
Where do I see the results?
Since, at the beginning of the test, only the following came out.

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 448 minutes for test to complete.
Test will complete after Fri May 3 04:17:59 2019

Use smartctl -X to abort test.

I left it like this, thinking that it would give me some path to see the complete log at the end, but it did not.

deank · May 2019

Well, that's not a result. That's for sure.

Just send the SMART output to support. That's generally enough.

rm_ · May 2019

Redondit0 said: Where do I see the results?

In smartctl -a /dev/sdX, look for Self-test execution status, and when it completes, in the section below the line saying SMART Self-test log structure.

Redondit0 · May 2019

@rm_ said:

Redondit0 said: Where do I see the results?

In smartctl -a /dev/sdX, look for Self-test execution status, and when it completes, in the section below the line saying SMART Self-test log structure.

Thanks!

I thank everyone for the indications, opinions and experience that you share with me.
Seeing and considering that I planned to do a re-installation of the OS, and rclone takes all the time in the world to upload the 5tb to google, I decided to request a new server and do the data transfer directly from server to server, taking advantage to look for a processor a bit better. What dou you recommend? An i7-3770, an i7-2600 or a Xeon E3-1245?
It is for plex (3 simultaneous transmissions, maximum), torrent and rar/unrar.

TheLinuxBug · May 2019

rm_ said: These can be fine and just show the disk had a bad cable connection some time in the past.

This is true, but hopefully if it is in your own setup you will know if such is involved with the appearance of that result. Such as a power outage or you just did maintenance and maybe the SATA cable wasn't connected well on first boot of the drive, etc. Yes there are reason why that could go up that would be unrelated to the drive going bad. However, in a DC environment if your seeing this go up, then at minimum you should be requesting they replace the SATA cable first, then if it continues I would for sure be asking for a replacement.

Good point though, as I didn't define that very well.

Cheers!

akhfa · May 2019

I have bad disk with them once on my auction server, and they replace the disk with new SSD in about six hours after inquiry

Bayu · May 2019

@Redondit0 said:
What dou you recommend? An i7-3770, an i7-2600 or a Xeon E3-1245?
It is for plex (3 simultaneous transmissions, maximum), torrent and rar/unrar.

In term of video encoding speed (x264), i7-3770 better than E3-1245. Also with gpu accelerated encode (intel quicksync) for plex.

Milon · May 2019

Maybe somebody can help we with advice. On one home ubunty server I use hdd from old notebook. Time to time it makes sound ding (maybe head parking or similar sound). hdparm show that Advanced Power Managment is off.

Smart:
https://pastebin.com/5JrJaU6p
p.s. It looks like Seek_Error_Rate and Multi_Zone_Error_Rate increase quickly.
Is patient dying?

Edding · May 2019

i would ask for a replacement .. that hdd is dying

Falzo · May 2019

Does not look like close to dying to me. Sectors all good and it just passed an extended offline test without problems.

Rapidly changing numbers might depend on the firmware. Some vendors use these fields for diagnostic stuff as has been mentioned above.

deank · May 2019

When a HDD is dying, you or your client(s) will know. Trust WSS.

Howdy, Stranger!

Categories

In this Discussion

Should I ask for a new drive in my hetzner slot?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Should I ask for a new drive in my hetzner slot?

Comments