Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Is my server SSD disk healty?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Is my server SSD disk healty?

akhfaakhfa Member

Hi,

Sometimes ago I bought hetzner Finland server with SSD for around 30 euro because I can't resist :(

Anyway I have 2 Crucial_CT256MX100SSD1 with these 2 smart values

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0033   100   100   000    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       25235
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       14
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   059   059   000    Old_age   Always       -       1256
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       6
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2159
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   067   044   000    Old_age   Always       -       33 (Min/Max 20/56)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Used   0x0031   059   059   000    Pre-fail  Offline      -       41
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       93537292190
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       3021321009
248 Bckgnd_Program_Page_Cnt 0x0032   100   100   000    Old_age   Always       -       810663606

and the second one

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0033   100   100   000    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       18609
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       14
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   087   087   000    Old_age   Always       -       390
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       5
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2159
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   067   046   000    Old_age   Always       -       33 (Min/Max 20/54)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Used   0x0031   087   087   000    Pre-fail  Offline      -       13
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       22511553499
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       796718609
248 Bckgnd_Program_Page_Cnt 0x0032   100   100   000    Old_age   Always       -       1517302968

I use this server with RAID 1.

From power on hours perspective, I think they are pretty good. What makes me worried is 180 Unused_Reserve_NAND_Blk and 202 Percent_Lifetime_Used.

From some gogling, there is people that recommend to replace the disk if the value of Reallocate_NAND_Blk_Cnt or Unused_Reserve_NAND_Blk changed. Maybe I'm agree the disk need to be replaced if Reallocate_NAND_Blk_Cnt changed, but what about Unused_Reserve_NAND_Blk?

And then what does Percent_Lifetime_Used means? Is it really means that more close the value to 100, its lifetime is less?

The last question, how hetzner handle disk replacement? For example if I have these 2 disk with RAID 1 softraid, can they replace the disk without downtime? Maybe something like hotswap disk?

Comments

  • akhfa said: The last question, how hetzner handle disk replacement? For example if I have these 2 disk with RAID 1 softraid, can they replace the disk without downtime?

    If you are able to convince them of disk failure, they change disk pretty quick with downtime of around 15-20 mins (from what I've heard). Had once asked for 2 x 2TB SSD and server was down for not more than 10-15 minutes of scheduled time.

    Thanked by 1akhfa
  • akhfaakhfa Member
    edited March 2018

    @jetchirag said:
    If you are able to convince them of disk failure, they change disk pretty quick with downtime of around 15-20 mins (from what I've heard). Had once asked for 2 x 2TB SSD and server was down for not more than 10-15 minutes of scheduled time.

    What smart value at that time do you tell to them?
    And btw what kind of server that has 2 TB SSD? I think I didn't find it in their server package

  • FalzoFalzo Member

    those SSDs are in good, used shape.

    that type of SSD is specified with 72 TBW: https://www.crucial.com/wcsstore/CrucialSAS/pdf/product-flyer/ssd/crucial-mx100-ssd-product-flyer-en.pdf

    your 93'537'292'190 sectors written (512 byte sector size) result in 43TB written. the 2nd one is even only around 10TB.

    UNUSED reserve blocks are good if that high as those still are available/unused.

    lifetime percentage only is bad if its RAW value closes in on 100%. the first one is at 41% lifetime, and the second even only at 13% which matches the TBW numbers quite good to conclude on the technical age.

    as said above, I'd say both SSDs are comfortable below the manufacturers specs/limit, there is no reason to have them changed yet.

  • akhfaakhfa Member

    Falzo said: UNUSED reserve blocks are good if that high as those still are available/unused.

    So along the way this value will be decreased, isn't it?

  • deankdeank Member, Troll

    No, it's dying. Ask them to replace it ASAP.

  • FalzoFalzo Member

    @akhfa said:

    Falzo said: UNUSED reserve blocks are good if that high as those still are available/unused.

    So along the way this value will be decreased, isn't it?

    yes, it might decrease, once there are worn out blocks which then gets replaced by those reserved ones.

    from the looks of it both of your SSDs show 2159 as RAW and 00 as VALUE, which makes me guess that this is the default/initial setting and so far not a single reserved block was needed.

    I think you have nothing to worry about those disks. also hetzner most likely wouldn't change them, as there is nothing wrong about them ;-)

  • akhfa said: I think I didn't find it in their server package

    These were additional disks and weren't replacements. I gave that as reference for how long does it take to replace/add disks.

  • akhfaakhfa Member
    edited March 2018

    @deank said:
    No, it's dying. Ask them to replace it ASAP.

    Yes I know all disk is dying, even all people is dying :)

    @Falzo thank you, big help as always. Clear explanation. Sometimes SMART seems confusing for me because many vendor have some of their own unique value. Nice to have you around :)

    @jetchirag said:

    akhfa said: I think I didn't find it in their server package

    These were additional disks and weren't replacements. I gave that as reference for how long does it take to replace/add disks.

    I see. Thank you for your information. I ask about the SSD because it seems interesting to have SSD that big :)

Sign In or Register to comment.