Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

corrupt raid 0 config (no superblock)

afnafn Member
edited September 2025 in Help

Hi,

I have a server with 4 drives, 3 of them are in raid 0, the sdd is not part of the raid 0 array. What we did was 100GB /root partition on a single disk and raid 0 for the 3 drives manually.

sda       8:0    0  7.3T  0 disk
├─sda1    8:1    0  300M  0 part  /boot/efi
├─sda2    8:2    0  500M  0 part  /boot
├─sda3    8:3    0    4G  0 part  [SWAP]
├─sda4    8:4    0 97.7G  0 part  /
└─sda5    8:5    0  7.2T  0 part
  └─md0   9:0    0 21.7T  0 raid0 /home
sdb       8:16   0  7.3T  0 disk
└─sdb1    8:17   0  7.3T  0 part
  └─md0   9:0    0 21.7T  0 raid0 /home
sdc       8:32   0  7.3T  0 disk
└─sdc1    8:33   0  7.3T  0 part
  └─md0   9:0    0 21.7T  0 raid0 /home
sdd       8:48   0  7.3T  0 disk
└─sdd1    8:49   0  7.3T  0 part  /backup

The server was working fine for years after this configuration, until we restarted it recently, and to my surprise it was unable to mount the raid array. The sda disk seems to have some bad sectors but overall works, and the server boots, so I am not sure if it's the cause, I believe it's mostly a config error that was never detected since we never restarted after the initial setup :(

When I try mdadm --assemble --force /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sda5
I get
mdadm: /dev/sdb1 is busy - skipping mdadm: /dev/sdc1 is busy - skipping mdadm: no recogniseable superblock on /dev/sda5 mdadm: /dev/sda5 has no superblock - assembly aborted

Is there any chance to recover at least the file structure and paths, even partially, just some of them?

I have a backup from a couple of weeks before the incident, but just a day before the incident we did a lot of clean up and changed many things, so I was hoping to recover some of the file paths/filenames if that's possible. I don't need any of the files themselves, just filenames and paths if possible.

Thanks

Comments

  • itsTomHarperitsTomHarper Member, Megathread Squad

    Always do a reboot test. One of the important lesson I've learned in years of doing devops

    As for recovery, try testdisk if you could somehow re-assemble the array to recover data or at least the layout

    Thanked by 2MikeA Patriarch
  • layer7layer7 Member, Host Rep, LIR

    @afn said:
    Hi,

    I have a server with 4 drives, 3 of them are in raid 0, the sdd is not part of the raid 0 array. What we did was 100GB /root partition on a single disk and raid 0 for the 3 drives manually.

    sda       8:0    0  7.3T  0 disk
    ├─sda1    8:1    0  300M  0 part  /boot/efi
    ├─sda2    8:2    0  500M  0 part  /boot
    ├─sda3    8:3    0    4G  0 part  [SWAP]
    ├─sda4    8:4    0 97.7G  0 part  /
    └─sda5    8:5    0  7.2T  0 part
      └─md0   9:0    0 21.7T  0 raid0 /home
    sdb       8:16   0  7.3T  0 disk
    └─sdb1    8:17   0  7.3T  0 part
      └─md0   9:0    0 21.7T  0 raid0 /home
    sdc       8:32   0  7.3T  0 disk
    └─sdc1    8:33   0  7.3T  0 part
      └─md0   9:0    0 21.7T  0 raid0 /home
    sdd       8:48   0  7.3T  0 disk
    └─sdd1    8:49   0  7.3T  0 part  /backup
    

    The server was working fine for years after this configuration, until we restarted it recently, and to my surprise it was unable to mount the raid array. The sda disk seems to have some bad sectors but overall works, and the server boots, so I am not sure if it's the cause, I believe it's mostly a config error that was never detected since we never restarted after the initial setup :(

    When I try mdadm --assemble --force /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sda5
    I get
    mdadm: /dev/sdb1 is busy - skipping mdadm: /dev/sdc1 is busy - skipping mdadm: no recogniseable superblock on /dev/sda5 mdadm: /dev/sda5 has no superblock - assembly aborted

    Is there any chance to recover at least the file structure and paths, even partially, just some of them?

    I have a backup from a couple of weeks before the incident, but just a day before the incident we did a lot of clean up and changed many things, so I was hoping to recover some of the file paths/filenames if that's possible. I don't need any of the files themselves, just filenames and paths if possible.

    Thanks

    Hi,

    your favorite search engine is your friend....

    To help you, not enough information were provisioned.

    Is the sda drive OK? Smart looks fine and all?

    Content of /etc/mdadm/mdadm.conf? ( or where ever its located on your OS )

    Output of mdadm --assemble --scan?

    Interesting kernel messages during this all?

    And so on and so on....

    Running raid0 is always what it is. But if it needs to run like this, maybe filesystems with more resilence for logical errors ( like checksum feature ) should be used.

    Restoring 20 TB of data is usually something that people will like to avoid, even you would have a backup from 1 second ago.

  • afnafn Member
    edited September 2025

    Always do a reboot test. One of the important lesson I've learned in years of doing devops

    Learned that the hard way :/

    Is the sda drive OK? Smart looks fine and all?

    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
      2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
      3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       15918
      4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       19
      5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
      8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
      9 Power_On_Hours          0x0032   001   001   000    Old_age   Always       -       44697
     10 Spin_Retry_Count        0x0033   100   100   030    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       19
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       15
    193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       82
    194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       37 (Min/Max 14/41)
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
    197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       16
    198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       2
    199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
    220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       0
    222 Loaded_Hours            0x0032   001   001   000    Old_age   Always       -       44683
    223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
    224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
    226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       630
    240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0
    
    SMART Error Log Version: 1
    ATA Error Count: 64 (device log contains only the most recent five errors)
            CR = Command Register [HEX]
            FR = Features Register [HEX]
            SC = Sector Count Register [HEX]
            SN = Sector Number Register [HEX]
            CL = Cylinder Low Register [HEX]
            CH = Cylinder High Register [HEX]
            DH = Device/Head Register [HEX]
            DC = Device Command Register [HEX]
            ER = Error register [HEX]
            ST = Status register [HEX]
    Powered_Up_Time is measured from power on, and printed as
    DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
    SS=sec, and sss=millisec. It "wraps" after 49.710 days.
    
    Error 64 occurred at disk power-on lifetime: 44402 hours (1850 days + 2 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 41 28 48 2a d2 40  Error: UNC at LBA = 0x00d22a48 = 13773384
    
      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 01 30 49 2a d2 40 00  21d+18:32:49.704  READ FPDMA QUEUED
      60 01 28 48 2a d2 40 00  21d+18:32:46.969  READ FPDMA QUEUED
      60 01 20 47 2a d2 40 00  21d+18:32:46.969  READ FPDMA QUEUED
      60 01 18 46 2a d2 40 00  21d+18:32:46.969  READ FPDMA QUEUED
      60 01 10 45 2a d2 40 00  21d+18:32:46.969  READ FPDMA QUEUED
    
    Error 63 occurred at disk power-on lifetime: 44402 hours (1850 days + 2 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 41 e0 48 2a d2 40  Error: UNC at LBA = 0x00d22a48 = 13773384
    
      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 00 e8 e0 2a d2 40 00  21d+18:32:46.253  READ FPDMA QUEUED
      60 00 e0 e0 29 d2 40 00  21d+18:32:43.509  READ FPDMA QUEUED
      60 00 d8 e0 28 d2 40 00  21d+18:32:43.405  READ FPDMA QUEUED
      60 00 d0 e0 27 d2 40 00  21d+18:32:43.404  READ FPDMA QUEUED
      60 00 c8 e0 26 d2 40 00  21d+18:32:43.403  READ FPDMA QUEUED
    
    Error 62 occurred at disk power-on lifetime: 43882 hours (1828 days + 10 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 41 28 48 2a d2 40  Error: UNC at LBA = 0x00d22a48 = 13773384
    
      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 08 38 c0 45 ce 40 00      03:08:49.442  READ FPDMA QUEUED
      60 48 30 c0 2c d2 40 00      03:08:46.488  READ FPDMA QUEUED
      60 b8 28 08 26 d2 40 00      03:08:46.488  READ FPDMA QUEUED
      60 00 20 08 1c d2 40 00      03:08:46.483  READ FPDMA QUEUED
      60 00 18 08 12 d2 40 00      03:08:46.461  READ FPDMA QUEUED
    
    Error 61 occurred at disk power-on lifetime: 43882 hours (1828 days + 10 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 41 a8 48 2a d2 40  Error: UNC at LBA = 0x00d22a48 = 13773384
    
      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 08 b8 c0 45 ce 40 00      03:03:08.869  READ FPDMA QUEUED
      60 28 b0 e0 2f d2 40 00      03:03:06.055  READ FPDMA QUEUED
      60 d8 a8 08 26 d2 40 00      03:03:06.055  READ FPDMA QUEUED
      60 00 a0 08 1c d2 40 00      03:03:06.050  READ FPDMA QUEUED
      60 00 98 08 12 d2 40 00      03:03:06.046  READ FPDMA QUEUED
    
    Error 60 occurred at disk power-on lifetime: 43882 hours (1828 days + 10 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 41 28 48 2a d2 40  Error: UNC at LBA = 0x00d22a48 = 13773384
    
      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 08 38 c0 45 ce 40 00      02:48:36.423  READ FPDMA QUEUED
      60 c0 30 48 2b d2 40 00      02:48:33.625  READ FPDMA QUEUED
      60 40 28 08 26 d2 40 00      02:48:33.625  READ FPDMA QUEUED
      60 08 80 38 08 ce 40 00      02:48:33.622  READ FPDMA QUEUED
      60 20 20 e8 22 d2 40 00      02:48:33.622  READ FPDMA QUEUED
    
    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Extended offline    Completed: read failure       00%     44412         215099976
    # 2  Short offline       Completed without error       00%     43879         -
    # 3  Short offline       Completed without error       00%     43879         -
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    

    Content of /etc/mdadm/mdadm.conf

    HOMEHOST <system>
    
    # instruct the monitoring daemon where to send mail alerts
    MAILADDR root
    
    # definitions of existing MD arrays
    
    # This configuration was auto-generated on Sat, 27 Feb 2021 10:22:01 +0000 by mkconf
    ARRAY /dev/md/0  metadata=1.2 UUID=9ad6da06:d589bd84:558e7f4e:9228e557 name=server:0
    

    Output of mdadm --assemble --scan?

    Nothing

    root@myserver ~ # mdadm --assemble --scan
    root@myserver ~ #
    

    Kernel messages

    I just checked, I think this is relevant

    mount: /home: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error
    
    [ 5364.979604] print_req_error: I/O error, dev sda, sector 215099976
    [ 5364.979821] ata1: EH complete
    [ 5368.480319] ata1.00: exception Emask 0x0 SAct 0x8e000400 SErr 0x0 action 0x0
    [ 5368.481281] ata1.00: irq_stat 0x40000008
    [ 5368.482187]  5368.487877]EXT4-fs (md0): can't read g
    
  • Did you check the status of the raid before trying to force assemble it? What did mdadm -D /dev/md0 give?

    Doesn't read like it did not find it, but rather it got corrupted. Trying to force anything on half assembled array probably makes things worse...

    Stop the array, try to recover the superblocks for sda5 (ask google or chatgpt, I have never done that)
    And only then try to reassemble it.

    In general consider you data gone. Striping data across multiple disks aka Raid0 comes with exactly that risk of loosing it all on a single drives failure

  • afnafn Member
    edited September 2025

    tl;dr: At best I can create the array, but can't mount it, details:

    @Falzo said: Did you check the status of the raid before trying to force assemble it? What did mdadm -D /dev/md0 give?

    At first we had the 3 drives, but after messing a bit with it, this is what we have now

    /dev/md0:
               Version : 1.2
            Raid Level : raid0
         Total Devices : 2
           Persistence : Superblock is persistent
    
                 State : inactive
       Working Devices : 2
    
                  Name : server:0
                  UUID : 9ad6da06:d589bd84:558e7f4e:9228e557
                Events : 0
    
        Number   Major   Minor   RaidDevice
    
           -       8       33        -        /dev/sdc1
           -       8       17        -        /dev/sdb1
    

    (ask google or chatgpt,

    yup, already had my little chat with chatgpt and tried a couple of things, it keeps suggesting to do

     sudo mdadm --create /dev/md0   --level=0   --raid-devices=3   --metadata=1.2   --chunk=512K   --assume-clean   --force   /dev/sda5 /dev/sdb1 /dev/sdc1
    

    Which gives

    mdadm: /dev/sda5 appears to contain an ext2fs file system
           size=7706612036K  mtime=Thu Jan  1 01:00:00 1970
    mdadm: /dev/sdb1 appears to be part of a raid array:
           level=raid0 devices=3 ctime= blablabla 2022
    mdadm: /dev/sdc1 appears to be part of a raid array:
           level=raid0 devices=3 ctime= blablabla 2022
    
    Continue creating array? yes
    mdadm: array /dev/md0 started.
    

    at best I can get the raid array "started" then when I mount I get

    #mount -o ro /dev/md0 /mnt/test
    mount: /mnt/test: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error.
    

    And this situation when we recreate with --assume -clean and --force we have:

     #  mdadm -D /dev/md0
    /dev/md0:
               Version : 1.2
         Creation Time : Mon Sep 22 11:20:06 2025
            Raid Level : raid0
            Array Size : 23334265856 (22253.29 GiB 23894.29 GB)
          Raid Devices : 3
         Total Devices : 3
           Persistence : Superblock is persistent
    
           Update Time : Mon Sep 22 11:20:06 2025
                 State : clean
        Active Devices : 3
       Working Devices : 3
        Failed Devices : 0
         Spare Devices : 0
    
            Chunk Size : 512K
    
    Consistency Policy : none
    
                  Name : XXX:0  (local to host XXX)
                  UUID : b430686c:6520ecd3:94237803:0f89e0a3
                Events : 0
    
        Number   Major   Minor   RaidDevice State
           0       8        5        0      active sync   /dev/sda5
           1       8       17        1      active sync   /dev/sdb1
           2       8       33        2      active sync   /dev/sdc1
    
  • AndreixAndreix Member, Host Rep

    RAID0 ?! Complete reinstall.

    Thanked by 1oijm17
  • [@afn said]

    interesting. especially the last mdadm details suggest that there is nothing wrong with the assembled array.

    the fs on top however now seems borked and gives you that error on mount. did you run e2fsck on the assembled /dev/md0 yet? this would be the next step to try and get the filesystem repaired...

    Thanked by 3afn OhJohn oijm17
  • afnafn Member
    edited September 2025

    @Andreix said: RAID0 ?! Complete reinstall.

    God, am I glad I didn't follow this advice...

    @Falzo
    I didn't know that utility. It solved my issue. It's back! and now I am backing up everything to a different server

    Just tried it. e2fsck /dev/md0

    It gave some

    Inode 40335229 passes checks, but checksum does not match inode.  Fix? yes
    
    Deleted inode 40335230 has zero dtime.  Fix? yes
    

    etc, it keeps giving these messages for a lot of different inodes

    and then it gave

    Inode 220043289 extent tree (at level 1) could be narrower.  Optimize? yes
    
    Inode 220043337 extent tree (at level 2) could be narrower.  Optimize? yes
    
    Inode 220043415 extent tree (at level 1) could be narrower.  Optimize? yes
    

    It took too long with inode 220043415, then it continued a lot of those "could be narrower" messages.

    then it kept printing a massive amount of numbers and finally the messages ended with:

    Directories count wrong for group #172784 (0, counted=128).
    Fix? yes
    
    Free inodes count wrong for group #172800 (2048, counted=1619).
    Fix? yes
    
    Directories count wrong for group #172800 (0, counted=50).
    Fix? yes
    
    Free inodes count wrong (364599285, counted=364535939).
    Fix? yes
    
    
    /dev/md0: ***** FILE SYSTEM WAS MODIFIED *****
    /dev/md0: 63357/364599296 files (4.4% non-contiguous), 3003267917/5833566464 blocks
    

    ** then it finally worked**

    I just did mount -o ro /dev/md0 /mnt/tempmd0

    And I have my raid array back!!

    when you suggested I try to re-assemble, I tried again, then followed your advice for e2fsck .

    I can't thank you enough @Falzo. You have no idea how grateful I am.

  • plumbergplumberg Veteran, Megathread Squad

    Awesome work @Falzo

    Thanked by 1OhJohn
  • plumbergplumberg Veteran, Megathread Squad

    @afn so any upgrade to raid in future? Or YOLO with @Falzo for Raid 0

  • afnafn Member
    edited September 2025

    @plumberg said: @afn so any upgrade to raid in future? Or YOLO with @Falzo for Raid 0

    Well, raid 0 wasn't that bad... the human who set it up + the human who didn't verify nor restart the server ever + the low frequency of backups are the things that need an upgrade here :sweat_smile:
    I will probably increase the backup sync frequency, and probably launch a resync manually every time I make major changes. And definitely I will reboot any new server after the initial setup (reminder this raid was created and mounted without ever restarting the server :blush: )

    Honestly, I had started to make my peace with losing the data, and I started re-sorting the backup and trying to figure out which files to keep from the backup. I had already given up and the backup was almost 99% of what we already had but it's just that many folders got cleaned up moved, renamed, sorted, etc after we took that old backup. What REALLY counts more than recovering a couple of files is the MASSIVE satisfaction you get from fixing something and for that I really can't thank @Falzo enough!

    I was literally just one single command away from fixing everything, it was just that e2fsck that Falzo suggested that did the magic for me. It was very simple, but you need to know it exists. I never knew about it. Most people would just say "oh raid 0, forget it" and people on the internet love to lecture you with "maybe next time do more regular backups" like yeah thanks for constructive information. But you rarely find useful suggestions for recovery :(

    There is nothing more fun than pasting some random commands you don't fully understand in your terminal following recommendations from strangers on the internet, right? What can possibly go wrong (hint: a raid 0 array that gets messed up when you reboot your server)

    Thanked by 1plumberg
  • plumbergplumberg Veteran, Megathread Squad

    @afn said:

    @plumberg said: @afn so any upgrade to raid in future? Or YOLO with @Falzo for Raid 0

    Well, raid 0 wasn't that bad... the human who set it up + the human who didn't verify nor restart the server ever + the low frequency of backups are the things that need an upgrade here :sweat_smile:
    I will probably increase the backup sync frequency, and probably launch a resync manually every time I make major changes. And definitely I will reboot any new server after the initial setup (reminder this raid was created and mounted without ever restarting the server :blush: )

    Honestly, I had started to make my peace with losing the data, and I started re-sorting the backup and trying to figure out which files to keep from the backup. I had already given up and the backup was almost 99% of what we already had but it's just that many folders got cleaned up moved, renamed, sorted, etc after we took that old backup. What REALLY counts more than recovering a couple of files is the MASSIVE satisfaction you get from fixing something and for that I really can't thank @Falzo enough!

    There is nothing more fun than pasting some random commands you don't fully understand in your terminal following recommendations from strangers on the internet, right? What can possibly go wrong (hint: a raid 0 array that gets messed up when you reboot your server)

    i hear you
    yes, i have also learned something new and will like to implement the reboot test to ensure nothing screws up later.

    i am really glad folks like @Falzo are around here to share their expertise - however simple or complex the thing looks.

    Sometimes, yes, one needs to make peace and move on. But, at the end of the road you suddenly see things coming back is just pure pleasure to experience.
    Good luck!

    Thanked by 2afn OhJohn
  • AndreixAndreix Member, Host Rep

    @afn said:

    @Andreix said: RAID0 ?! Complete reinstall.

    God, am I glad I didn't follow this advice...

    That's a ticking bomb ether way...

  • TimboJonesTimboJones Member
    edited September 2025

    Backup frequency should match how often the data changes. The lesson learned should have at least been daily backups are necessary.

    And then you could have replaced the bad sector disk, make partitions and restore from backup and not worry about finding silent corruption down the road.

    (I recommend backup with incremental and compression support).

    Thanked by 1oijm17
  • @TimboJones said: And then you could have replaced the bad sector disk, make partitions and restore from backup and not worry about finding silent corruption down the road.

    Thanks for the advice. This is exactly what I am doing, I am not gonna rely on the data from that raid as their might be corruptions. I will use the backup and modify it as necessary.

  • I'd be tailing the log the whole time copying this thing over if I was going to rely on the data. having a corrupt media file is probably fine but having part of a db or binary corrupted would not be good. And you probably wont know it until it is too late. It appears to me you have a disk going out

    Thanked by 1afn
  • @afn happy for you that this helped and you were able to get it back into a usable state.

    as @jperkins mentioned, there is a good possibility that now some files are corrupted. e2fsck checks the filesystem and not files, this can be quite a difference. so when it does corrections, it will check superblocks, counters, descriptors, pointers but not the content of files.
    if there are a lot of corrections like deleting duplicate blocks there will be borked files left behind with a very high probability.

    in general I suggest to read up a bit on the different layers here, physical devices -> soft raid array -> filesystem -> data
    will help with the next incident to have a better idea what to check first and in which order try repairing. sadly there is no easy recipe anyway, it's hard to recover and easy to brick it even more on the way.

    TL;DR; again happy I could help, just look at your recovered files with a bit of doubt or try to verify them.

    Thanked by 3afn OhJohn oijm17
  • afnafn Member
    edited September 2025

    Thanks for the advice @jperkins and @Falzo.

    I don't intend to use the "recovered files", nor that raid array again. I will only match the files list to the backup files list to identify the difference and restructure my backed up data accordingly. As for recent files missing from the backup, these will be re-uploaded via FTP manually from their original sources when possible, I am not relying at all on the files coming from the raid array. All my files exist somewhere else. Just the folder structures, sorting, etc was primarily what I needed to recover.

    Thanked by 1Falzo
  • a very high probability.

    Edit: In fact, it's not just that some probability, I just confirmed that most of the files I recovered are corrupted, their checksum values no longer match the expected values. But that's totally fine, as my initial purpose was just to get a listing of folders and files.

    Thanked by 3Falzo oijm17 plumberg
  • FalzoFalzo Member
    edited September 2025

    In the end maybe really your raid0 got messed up. That sda5 was detected for having an ext2 superblock seems weird enough. It would at least be an explanation for the lots of broken files (essentially everything that lost some stripes).

    If you don't really need raid0 for the performance gain, I'd suggest to have the disk just as single drives mounted or look into using LVM to combine but without striping.

    Thanked by 1akhfa
  • @Falzo said:

    If you don't really need raid0 for the performance gain, I'd suggest to have the disk just as single drives mounted or look into using LVM to combine but without striping.

    Oh I just know that by default, when Lvm combining disk, it is not striping. Thank you!

Sign In or Register to comment.