New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
What is the standard behavior of providers on failing hard drives?
I have a dedicated server with OneProvider, that I believe is in the Online.net datacenter.
I have a dedicated server with two 1TB HDD that i use in Raid Z1 with FreeBSD.
Performance is low for a variety of factors but i do not care about that and it is more tha enough for the bargain price I'm paying.
I was wondering, what's the standard behavior of provider with Old and Pre_Fail drives. Should I wait for the disk to fail in order to get it replaced, with a small chance of both disks failing at the same time or may it be possible to ask for at least one replacement out of two before it fails?
Here are both smartctl outputs for reference:
smartctl 7.0 2018-12-30 r4883 [FreeBSD 12.0-RELEASE-p13 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital RE4
Device Model: WDC WD1003FBYX-18Y7B0
Serial Number: WD-WCAW30467720
LU WWN Device Id: 5 0014ee 2055ef1dd
Add. Product Id: DELL(tm)
Firmware Version: 01.01V02
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Wed Jun 10 00:43:30 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Warning! SMART Attribute Data Structure error: invalid SMART checksum.
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x00) Offline data collection not supported.
SMART capabilities: (0x0000) Automatic saving of SMART data is not implemented.
Error logging capability: (0x00) Error logging supported.
General Purpose Logging supported.
SCT capabilities: (0x303f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 22
3 Spin_Up_Time 0x0027 171 168 021 Pre-fail Always - 4450
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 38
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 046 046 000 Old_age Always - 39722
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 36
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 35
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2
194 Temperature_Celsius 0x0022 111 105 000 Old_age Always - 36 (Min/Max 32/39)
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 4 -
# 2 Short offline Completed without error 00% 1 -
Selective Self-tests/Logging not supported
smartctl --all /dev/da1
smartctl 7.0 2018-12-30 r4883 [FreeBSD 12.0-RELEASE-p13 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital RE4
Device Model: WDC WD1003FBYX-18Y7B0
Serial Number: WD-WCAW30579252
LU WWN Device Id: 5 0014ee 25aba18f9
Add. Product Id: DELL(tm)
Firmware Version: 01.01V02
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Wed Jun 10 00:43:52 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Warning! SMART Attribute Data Structure error: invalid SMART checksum.
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x00) Offline data collection not supported.
SMART capabilities: (0x0000) Automatic saving of SMART data is not implemented.
Error logging capability: (0x00) Error logging supported.
General Purpose Logging supported.
SCT capabilities: (0x303f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 3
3 Spin_Up_Time 0x0027 172 171 021 Pre-fail Always - 4391
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 38
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 028 028 000 Old_age Always - 52782
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 36
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 35
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2
194 Temperature_Celsius 0x0022 116 109 000 Old_age Always - 31 (Min/Max 27/34)
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 3 -
# 2 Short offline Completed without error 00% 1 -
Selective Self-tests/Logging not supported
Comments
Your drives look fine. That field indicates what the attribute means (i.e. if the values in that field are near thresh, it indicates is early indicator of failure; whereas other attributes are likely to just indicate old age)
What you want to look out for is:
1. Failing tests
2. Errors logged
3. Pending sectors
4. Offline uncorrectable sectors
5. A high reallocated sector count
If you are concerned, you should run a long test. There hasn't been one of your drive in a long time, so it's probably worth doing.
Right now your disks are fine. just run a long test and see if it brings up any issues.
I personally would expect to see drive replacement when its actively failing or indicating that it will fail soon (like "reallocated sector count" )
Thank you very much for your suggestions, the long test is now running on both disks. Given the age of the drive I guess the solution is to set up a cron and run test with alerts periodically in order to catch an eventually failing drive as early as possible?
Just set up smartd (from smartmontools) for automatic monitoring
For a set up and forget solution I'd suggest setting up smartd w/notifications as suggested by @SGraf ; and set up a monthly cron or similar job to long test the drives. One drive at a time is best.
Is it possible that no type of self testing id supported by the disks?
Edit: i believe that
short
andlong
are supported whileoffline
is notThe real killer for spinny drives are those Reallocated_Sector_Ct. If I saw one, I would replace it.