Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
Godlike VPS
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

HOST-C, Chat, Updates, Stuff

1303133353648

Comments

  • fly056fly056 Member

    @maverick said:

    @host_c said:

    @maverick said: send Stuart with a hammer... to do a hard reset :D

    He shall not do such pornographic things......

    He does like to fix things with the hammer tho......

    who doesn't? :D

    dunno what's the problem with that sea gate, but fwiw i've always trusted WD more especially that series based on HGST tech they bought many years ago... those are animals, i just hope i haven't jinxed it now (some of mine are just about out of warranty as i write this)...

    good luck with repair!

    I have some HC520 drives on my home NAS. They are great so far. I got them as server pulls with a 5 year warranty for $75 each.

    Thanked by 1maverick
  • host_chost_c Patron Provider, Top Host, Megathread Squad

    Ok, official info was sent to remaining affected customers, please check your e-mails.

    Here is a time-lined description of the event of the fuckup:

    July 27, 2025 – Afternoon (GMT+3):

    Multiple Seagate ST18000NM019J drives (firmware KM02) across two nodes suddenly powered down due to a firmware-related failure. Drives began reporting critical SMART alerts (Data channel impending failure), causing the RAID-6/60 array to become unavailable.

    Result:
    Addon storage volumes became inaccessible, and VPS services depending on those volumes were disrupted. Some NVMe-based systems also experienced write issues due to OS-level I/O buffering.

    July 28, 2025 – Morning:
    Our team accessed the datacenter, identified the fault, and began recovery efforts. All NVMe-only VPS services were successfully migrated to healthy nodes.

    July 28–29, 2025:
    RAID array access was restored in degraded mode, enabling partial access to addon volumes at limited transfer speeds.

    🧪 Root Cause

    Firmware fault affecting multiple ST18000NM019J (KM02) drives simultaneously

    RAID controller entered fault mode due to concurrent SMART failures

    No physical disk damage, no reallocated sectors or ECC errors — this was purely firmware-triggered

    🛡️ Mitigation Going Forward

    We are conducting a full infrastructure audit to identify any remaining ST18000NM019J drives with KM02 firmware

    Affected drives will be proactively replaced or updated, where supported

    RAID monitoring thresholds and firmware validation processes are being tightened to catch these failures earlier

    This was an unprecedented firmware-level failure that bypassed typical RAID fault tolerance. We appreciate your understanding as we finalize recovery efforts for impacted systems.

    Here is an output of one of the drives, maybe it can help others to check theirs if they have the same model used, all 6 reported exactly the same error, have the same powered on hours ( ~266 days ) and were brand new.

    === START OF INFORMATION SECTION ===
    Vendor:               SEAGATE
    Product:              ST18000NM019J
    Revision:             KM02
    Compliance:           SPC-5
    User Capacity:        18,000,207,937,536 bytes [18.0 TB]
    Logical block size:   4096 bytes
    LU is fully provisioned
    Rotation Rate:        7200 rpm
    Form Factor:          3.5 inches
    Logical Unit id:      0x5000c500d8a51a07
    Serial number:        ZR57B8800000G20806CV
    Device type:          disk
    Transport protocol:   SAS (SPL-4)
    Local Time is:        Mon Jul 28 17:36:48 2025 UTC
    SMART support is:     Available - device has SMART capability.
    SMART support is:     Enabled
    Temperature Warning:  Enabled
    
    === START OF READ SMART DATA SECTION ===
    

    SMART Health Status: Data channel impending failure general hard drive failure [asc=5d, ascq=30]

    Grown defects during certification <not available>
    Total blocks reassigned during format <not available>
    Total new blocks reassigned <not available>
    Power on minutes since format <not available>
    Current Drive Temperature:     31 C
    Drive Trip Temperature:        60 C
    
    Accumulated power on time, hours:minutes 6367:42
    Manufactured in week 01 of year 2022
    Specified cycle count over device lifetime:  50000
    Accumulated start-stop cycles:  34
    Specified load-unload count over device lifetime:  600000
    Accumulated load-unload cycles:  291
    Elements in grown defect list: 1
    
    Vendor (Seagate Cache) information
      Blocks sent to initiator = 3828
      Blocks received from initiator = 1650689
      Blocks read from cache and sent to initiator = 9094
      Number of read and write commands whose size <= segment size = 29
      Number of read and write commands whose size > segment size = 0
    
    Vendor (Seagate/Hitachi) factory information
      number of hours powered up = 6367.70
      number of minutes until next internal SMART test = 53
    
    Seagate FARM log supported [try: -l farm]
    
    Error counter log:
               Errors Corrected by           Total   Correction     Gigabytes    Total
                   ECC          rereads/    errors   algorithm      processed    uncorrected
               fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
    read:          0        0         0         0          0          0.016           0
    write:         0        0         0         0          0          6.889           0
    
    Non-medium error count:        0
    
    Pending defect count:0 Pending Defects
    

    The error in bold triggered the detach of the drives from the raid array.

    Here is a screen shot from the log of one of the dells servers ( R740 ) showing 2 drives leaving the " chat" at the precise same time, DST was not set on the server so that is why the time shows only 12:00

  • allthemtingsallthemtings Member, Megathread Squad

    @host_c you dropped something





    👑

    Thanked by 3host_c plumberg oloke
  • JabJabJabJab Member

    Seagate more like Samsung am I right?!

    Thanked by 2host_c allthemtings
  • host_chost_c Patron Provider, Top Host, Megathread Squad

    @allthemtings said:
    @host_c you dropped something





    👑

    I always said that it is a question of when something like this will happen, rather if it will happen.

    These are totally out of ones control. To be fair, in my career, this is a forst, especially with SAS drives, not to mention New drives.

    For fuck sake, we have older 8-10-14 TB SAS drives that have 5 years powered on time and have no issues even today. ( WD/HGST/Seagate )

    Ah yes, Seagate was as helpful as the popcorn replays we will get here :D

    It is what it is.

    We will most probably switch to HGST in the next Drives Orders we will do.

  • FAT32FAT32 Administrator, Deal Compiler Extraordinaire

    After causing data loss by Seagate many years ago, I wouldnt touch them anymore even with a 10 foot pole

    Thanked by 2maverick Xrmaddness
  • host_chost_c Patron Provider, Top Host, Megathread Squad
    edited July 2025

    @FAT32 said:
    After causing data loss by Seagate many years ago, I wouldnt touch them anymore even with a 10 foot pole

    I will agree with you on this. At least when we have drive fails with HGST, it is usually 1 drive / server, not 4-5-6 at the same time.

    Don't get me wrong, we are used to drive fail, as well, we do storage, so I see nothing abnormal to change drives on a monthly basis in the data-center. But this was something new.

    Luck is that we do not have many 18 TB seagates left. ( Especially this model )

    Also, this issue is specific only to EXOS X18 line, mostly 16-18 TB models, at lest this was the info we googled and found for the past 48h.

    Customers have their options in the mail, and we have a new thing to put up to the check list.

    It is an unfortunate event, but fuck, this is life, some things break sometimes.

    Too bad this messed up our week of upgrades in the DC we wished to do, so that got derailed for another week or so.......

  • MMMMMMMMMMMM Member

    TL;DR

    Thanked by 3FAT32 host_c lukast__
  • layer7layer7 Member, Host Rep, LIR
    edited July 2025

    Hi,

    similar happened to a customer of us... with 8 TB Intel NVMe drives... failed faster than could be replaced.

    Sometimes it is what it is...

    Wish you all luck to get all data out there before things explode!

    And its just another case that should show clearly to everyone: Keep always backups somewhere... there is no 100% security, no matter how good the hoster or the hardware might be.

  • host_chost_c Patron Provider, Top Host, Megathread Squad

    @layer7 said:
    Hi,

    similar happened to a customer of us... with 8 TB Intel NVMe drives... failed faster than could be replaced.

    Sometimes it is what it is...

    Wish you all luck to get all data out there before things explode!

    And its just another case that should show clearly to everyone: Keep always backups somewhere... there is no 100% security, no matter how good the hoster or the hardware might be.

    THX :+1:

  • defaultdefault Veteran

    @host_c - I received your email. Thank you very much for being open and informing your customers about the causes of downtime. Such openness is highly appreciated with regards to respect for your business. If I may, I have some questions:

    1. Which drives will you use for customers choosing a fresh install?
    2. Which drives will you use for customers opting for recovery?
    3. Will IPv4 and IPv6 addresses change in the case of a fresh re-install?
    4. Will CPU change upon a fresh re-install?

    Thanks again for your understanding and clear communication.

    Thanked by 1maverick
  • truemagictruemagic Member
    edited July 2025

    Does that mean those without email updates....are not impacted?

    Edit: Oh well just checked I do received an email update from @host_c However none of my VPS seems to be affected (as I can still access them normally?). Is that so?

  • host_chost_c Patron Provider, Top Host, Megathread Squad
    edited July 2025

    @default said:
    @host_c - I received your email. Thank you very much for being open and informing your customers about the causes of downtime. Such openness is highly appreciated with regards to respect for your business. If I may, I have some questions:

    1. Which drives will you use for customers choosing a fresh install?
    2. Which drives will you use for customers opting for recovery?
    3. Will IPv4 and IPv6 addresses change in the case of a fresh re-install?
    4. Will CPU change upon a fresh re-install?

    Thanks again for your understanding and clear communication.

    1. Which drives will you use for customers choosing a fresh install?

    HGST 14 and 16 TB, some have a mix of Toshiba and HGST, sincerely I cannot recall from top of my head exactly.

    2. Which drives will you use for customers opting for recovery?

    same

    yet, customers are provisioned over a raid array, not individual drives!

    3. Will IPv4 and IPv6 addresses change in the case of a fresh re-install?

    No, these will be manually issued, so we will preserve IPs settings. However due to this manual provisioning, it will be slow, as firstly we have to delete the old config manually from the cluster, recreate it and so on

    4. Will CPU change upon a fresh re-install?

    No. CPU Type Generation will not change, Model might as we have from 2.4 to 2.7 GHz Scale Gen 2 CPU's

    @truemagic said: Does that mean those without email updates....are not impacted?

    Precisely, Mail was sent to VPS on those specific nodes. Who did not get any mail and has it's service up, it means nothing happened, carry on :D

    Thanked by 2maverick default
  • @host_c said: Ah yes, Seagate was as helpful as the popcorn replays we will get here :D

    It is what it is.

    We will most probably switch to HGST in the next Drives Orders we will do.

    thanks for sharing the juicy details with us

    Seagate, you suck!

    yeah, if they don't help you with this, don't buy any Seagate drive ever again!

  • host_chost_c Patron Provider, Top Host, Megathread Squad
    edited July 2025

    @maverick said:

    @host_c said: Ah yes, Seagate was as helpful as the popcorn replays we will get here :D

    It is what it is.

    We will most probably switch to HGST in the next Drives Orders we will do.

    thanks for sharing the juicy details with us

    Seagate, you suck!

    yeah, if they don't help you with this, don't buy any Seagate drive ever again!

    Well, they blown me off as the drives were not bought thru a certified Seagate reseller. as if any of us can manufacture a drive at home. Fuck me. :D

    There are 3 Drive Manufacturers in the WORLD:

    Seagate
    WD/HGST
    Toshiba

    So any drive you have bought that is enterprise and has a 5 year warranty should be replaceable regardless that you bought it on e-bay or a shop. ( in the limits that the drive does not have hammer marks or it did not operate in 50 degree Celsius )

    But, here is the reply from them, and I will underline the fact that we asked for a FW fix not a replace, as a fw fix might have helped more. I could not care less that we have 6 or 10 failed drives, that is my problem, I asked for FW fix as that is the issue that might had helped us and our customers issue; again, there is no mechanical issue with them, they just decided to go to holiday. :D and left the array in the middle of the day.

    Now, this is not a Seagate only policy, WD and Toshiba do the same.

    EDIT:

    One of the reasons I moved away from HP years ago was their restrictive firmware and BIOS update policy. Starting with Gen8 servers, critical updates — including fixes for issues that only emerge under specific conditions — were locked behind a support subscription.

    This approach is frustrating because firmware and BIOS bugs are not user-created issues; they are vendor-side flaws that should be resolved as a matter of responsibility. Requiring customers to pay for access to those fixes feels like a penalty for simply using the hardware.

    Now that HP is involved with Juniper, I can only hope they don’t bring this same restrictive, short-sighted policy mindset into that ecosystem. - tho I am positive they will.

  • NJa64FNJa64F Barred
    edited July 2025

    @host_c said: July 28–29, 2025:
    RAID array access was restored in degraded mode, enabling partial access to addon volumes at limited transfer speeds.

    Stuff happens. Then you fix it. Thats life. You're doing a good job communicating what happened.

    My question is: did any customers lose data ? the customers that requested recovery, how does that happen ? Is this a forensic recovery where the drives are sent out ?

    Thanked by 1maverick
  • host_chost_c Patron Provider, Top Host, Megathread Squad

    @jperkins said:

    @host_c said: July 28–29, 2025:
    RAID array access was restored in degraded mode, enabling partial access to addon volumes at limited transfer speeds.

    Stuff happens. Then you fix it. Thats life. You're doing a good job communicating what happened.

    My question is: did any customers lose data ? the customers that requested recovery, how does that happen ? Is this a forensic recovery where the drives are sent out ?

    This is not a forensic procedure , it is an I house solutin, we do not send out anything that has customer data to no one regardless the situation.

    We did manage to inport the array on an older controller that does not take into account the smart error of the drives ( for the moment ), but copy off them is extremely slow, a few mb/sec

    This is why we sent out the mail that those that do not have crucial data, can opt for provision of a new vps, as it is faster. Those that need the data will have to wait till we move the add-on drive to a new vps, slow, very slow.

    Unfortunately we cannot guarantee the integrity of the data we recover, that will be up to the user to check. This is the best we can do under the current circumstances.

  • NJa64FNJa64F Barred

    @host_c said: We did manage to inport the array on an older controller

    thanks for the explanation. Good luck to all involved and even though most of my stuff is backed up ya always wonder, 'what am I not backing up'

    Thanked by 2maverick host_c
  • defaultdefault Veteran
    edited July 2025

    @jperkins said:

    @host_c said: We did manage to inport the array on an older controller

    thanks for the explanation. Good luck to all involved and even though most of my stuff is backed up ya always wonder, 'what am I not backing up'

    You. There is no backup of you. Once you die, that's it. There is no backup of your firmware, especially considering it is patented and personalised for you. This is why the end is always nigh.

  • @default said:

    @jperkins said:

    @host_c said: We did manage to inport the array on an older controller

    thanks for the explanation. Good luck to all involved and even though most of my stuff is backed up ya always wonder, 'what am I not backing up'

    You. There is no backup of you. Once you die, that's it. There is no backup of your firmware, especially considering it is patented and personalised for you. This is why the end is always nigh.

    thats deep, mate.

    Thanked by 1host_c
  • Any flash sale plans? Could use a stronger (CPU, RAM) box for Immich to aid my existing storage VPS.

  • @BigBlue said:
    Any flash sale plans? Could use a stronger (CPU, RAM) box for Immich to aid my existing storage VPS.

    What is your current storage VPS specs? Just out of curiosity.

  • plumbergplumberg Veteran, Megathread Squad

    @BigBlue said:
    Any flash sale plans? Could use a stronger (CPU, RAM) box for Immich to aid my existing storage VPS.

    What are your current specs? I have a home instance crunching on rpi4

  • 1 vCPU, 2GB RAM, hosting 2 static sites, an OpenCloud instance, Open WebUI, Shlink and Actual Budget already

  • plumbergplumberg Veteran, Megathread Squad

    @BigBlue said:
    1 vCPU, 2GB RAM, hosting 2 static sites, an OpenCloud instance, Open WebUI, Shlink and Actual Budget already

    Depending on your photo library, You should really get a new box and not just add resources to this.

  • That's exactly the plan. Could also benefit from redundancy in the EU region, still being burnt by previous host's 'emergency migration' with now close to a week of unexpected downtime.

  • onewateronewater Member
    edited July 2025

    @host_c said:

    @maverick said:

    @host_c said: Ah yes, Seagate was as helpful as the popcorn replays we will get here :D

    It is what it is.

    We will most probably switch to HGST in the next Drives Orders we will do.

    thanks for sharing the juicy details with us

    Seagate, you suck!

    yeah, if they don't help you with this, don't buy any Seagate drive ever again!

    Well, they blown me off as the drives were not bought thru a certified Seagate reseller. as if any of us can manufacture a drive at home. Fuck me. :D

    There are 3 Drive Manufacturers in the WORLD:

    Seagate
    WD/HGST
    Toshiba

    So any drive you have bought that is enterprise and has a 5 year warranty should be replaceable regardless that you bought it on e-bay or a shop. ( in the limits that the drive does not have hammer marks or it did not operate in 50 degree Celsius )

    But, here is the reply from them, and I will underline the fact that we asked for a FW fix not a replace, as a fw fix might have helped more. I could not care less that we have 6 or 10 failed drives, that is my problem, I asked for FW fix as that is the issue that might had helped us and our customers issue; again, there is no mechanical issue with them, they just decided to go to holiday. :D and left the array in the middle of the day.

    Now, this is not a Seagate only policy, WD and Toshiba do the same.

    EDIT:

    One of the reasons I moved away from HP years ago was their restrictive firmware and BIOS update policy. Starting with Gen8 servers, critical updates — including fixes for issues that only emerge under specific conditions — were locked behind a support subscription.

    This approach is frustrating because firmware and BIOS bugs are not user-created issues; they are vendor-side flaws that should be resolved as a matter of responsibility. Requiring customers to pay for access to those fixes feels like a penalty for simply using the hardware.

    Now that HP is involved with Juniper, I can only hope they don’t bring this same restrictive, short-sighted policy mindset into that ecosystem. - tho I am positive they will.

    Correct a small mistake, currently Toshiba hard drives belong to WD. :D
    So there are only 2 Drive Manufacturers in the WORLD:
    Seagate and WD

    Thanked by 1host_c
  • TandMTandM Member
    edited July 2025

    @onewater said:

    Correct a small mistake, currently Toshiba hard drives belong to WD. :D
    So there are only 2 Drive Manufacturers in the WORLD:
    Seagate and WD

    Fairly certain Toshiba still is an independent hard drive manufacturer. They bought out certain 3.5" drive manufacturing facilities and IP from WD in 2012, with WD buying certain 2.5" facilities and IP in turn, and nothing has changed in the meantime AFAIK.

    Thanked by 1host_c
  • @host_c said:

    default said:
    @host_c - I received your email. Thank you very much for being open and informing your customers about the causes of downtime. Such openness is highly appreciated with regards to respect for your business. If I may, I have some questions:

    1. Which drives will you use for customers choosing a fresh install?
    2. Which drives will you use for customers opting for recovery?
    3. Will IPv4 and IPv6 addresses change in the case of a fresh re-install?
    4. Will CPU change upon a fresh re-install?

    Thanks again for your understanding and clear communication.

    1. Which drives will you use for customers choosing a fresh install?

    HGST 14 and 16 TB, some have a mix of Toshiba and HGST, sincerely I cannot recall from top of my head exactly.

    2. Which drives will you use for customers opting for recovery?

    same

    yet, customers are provisioned over a raid array, not individual drives!

    3. Will IPv4 and IPv6 addresses change in the case of a fresh re-install?

    No, these will be manually issued, so we will preserve IPs settings. However due to this manual provisioning, it will be slow, as firstly we have to delete the old config manually from the cluster, recreate it and so on

    4. Will CPU change upon a fresh re-install?

    No. CPU Type Generation will not change, Model might as we have from 2.4 to 2.7 GHz Scale Gen 2 CPU's

    truemagic said: Does that mean those without email updates....are not impacted?

    Precisely, Mail was sent to VPS on those specific nodes. Who did not get any mail and has it's service up, it means nothing happened, carry on :D

    Although I was not affected I highly appreciate the transparency. Thanks for taking your time to let others know about the issue so they can be aware. I feel I am in very good hands <3

    Thanked by 1maverick
  • host_chost_c Patron Provider, Top Host, Megathread Squad

    @onewater said: Seagate and WD

    Nice, THX for the update on this.

    I feel good having only 2 options, makes things far more simpler.

    :D :D

    Thanked by 2cainyxues onewater
Sign In or Register to comment.