All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Is my server SSD disk healty?
Hi,
Sometimes ago I bought hetzner Finland server with SSD for around 30 euro because I can't resist
Anyway I have 2 Crucial_CT256MX100SSD1 with these 2 smart values
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0 5 Reallocate_NAND_Blk_Cnt 0x0033 100 100 000 Pre-fail Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 25235 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 14 171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0 172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0 173 Ave_Block-Erase_Count 0x0032 059 059 000 Old_age Always - 1256 174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 6 180 Unused_Reserve_NAND_Blk 0x0033 000 000 000 Pre-fail Always - 2159 183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always - 0 184 Error_Correction_Count 0x0032 100 100 000 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 194 Temperature_Celsius 0x0022 067 044 000 Old_age Always - 33 (Min/Max 20/56) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0 202 Percent_Lifetime_Used 0x0031 059 059 000 Pre-fail Offline - 41 206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 0 210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 0 246 Total_Host_Sector_Write 0x0032 100 100 000 Old_age Always - 93537292190 247 Host_Program_Page_Count 0x0032 100 100 000 Old_age Always - 3021321009 248 Bckgnd_Program_Page_Cnt 0x0032 100 100 000 Old_age Always - 810663606
and the second one
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0 5 Reallocate_NAND_Blk_Cnt 0x0033 100 100 000 Pre-fail Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 18609 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 14 171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0 172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0 173 Ave_Block-Erase_Count 0x0032 087 087 000 Old_age Always - 390 174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 5 180 Unused_Reserve_NAND_Blk 0x0033 000 000 000 Pre-fail Always - 2159 183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always - 0 184 Error_Correction_Count 0x0032 100 100 000 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 194 Temperature_Celsius 0x0022 067 046 000 Old_age Always - 33 (Min/Max 20/54) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0 202 Percent_Lifetime_Used 0x0031 087 087 000 Pre-fail Offline - 13 206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 0 210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 0 246 Total_Host_Sector_Write 0x0032 100 100 000 Old_age Always - 22511553499 247 Host_Program_Page_Count 0x0032 100 100 000 Old_age Always - 796718609 248 Bckgnd_Program_Page_Cnt 0x0032 100 100 000 Old_age Always - 1517302968
I use this server with RAID 1.
From power on hours perspective, I think they are pretty good. What makes me worried is 180 Unused_Reserve_NAND_Blk
and 202 Percent_Lifetime_Used
.
From some gogling, there is people that recommend to replace the disk if the value of Reallocate_NAND_Blk_Cnt
or Unused_Reserve_NAND_Blk
changed. Maybe I'm agree the disk need to be replaced if Reallocate_NAND_Blk_Cnt changed, but what about Unused_Reserve_NAND_Blk
?
And then what does Percent_Lifetime_Used
means? Is it really means that more close the value to 100, its lifetime is less?
The last question, how hetzner handle disk replacement? For example if I have these 2 disk with RAID 1 softraid, can they replace the disk without downtime? Maybe something like hotswap disk?
Comments
If you are able to convince them of disk failure, they change disk pretty quick with downtime of around 15-20 mins (from what I've heard). Had once asked for 2 x 2TB SSD and server was down for not more than 10-15 minutes of scheduled time.
What smart value at that time do you tell to them?
And btw what kind of server that has 2 TB SSD? I think I didn't find it in their server package
those SSDs are in good, used shape.
that type of SSD is specified with 72 TBW: https://www.crucial.com/wcsstore/CrucialSAS/pdf/product-flyer/ssd/crucial-mx100-ssd-product-flyer-en.pdf
your 93'537'292'190 sectors written (512 byte sector size) result in 43TB written. the 2nd one is even only around 10TB.
UNUSED reserve blocks are good if that high as those still are available/unused.
lifetime percentage only is bad if its RAW value closes in on 100%. the first one is at 41% lifetime, and the second even only at 13% which matches the TBW numbers quite good to conclude on the technical age.
as said above, I'd say both SSDs are comfortable below the manufacturers specs/limit, there is no reason to have them changed yet.
So along the way this value will be decreased, isn't it?
No, it's dying. Ask them to replace it ASAP.
yes, it might decrease, once there are worn out blocks which then gets replaced by those reserved ones.
from the looks of it both of your SSDs show 2159 as RAW and 00 as VALUE, which makes me guess that this is the default/initial setting and so far not a single reserved block was needed.
I think you have nothing to worry about those disks. also hetzner most likely wouldn't change them, as there is nothing wrong about them ;-)
These were additional disks and weren't replacements. I gave that as reference for how long does it take to replace/add disks.
Yes I know all disk is dying, even all people is dying
@Falzo thank you, big help as always. Clear explanation. Sometimes SMART seems confusing for me because many vendor have some of their own unique value. Nice to have you around
I see. Thank you for your information. I ask about the SSD because it seems interesting to have SSD that big