All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
corrupt raid 0 config (no superblock)
Hi,
I have a server with 4 drives, 3 of them are in raid 0, the sdd is not part of the raid 0 array. What we did was 100GB /root partition on a single disk and raid 0 for the 3 drives manually.
sda 8:0 0 7.3T 0 disk
├─sda1 8:1 0 300M 0 part /boot/efi
├─sda2 8:2 0 500M 0 part /boot
├─sda3 8:3 0 4G 0 part [SWAP]
├─sda4 8:4 0 97.7G 0 part /
└─sda5 8:5 0 7.2T 0 part
└─md0 9:0 0 21.7T 0 raid0 /home
sdb 8:16 0 7.3T 0 disk
└─sdb1 8:17 0 7.3T 0 part
└─md0 9:0 0 21.7T 0 raid0 /home
sdc 8:32 0 7.3T 0 disk
└─sdc1 8:33 0 7.3T 0 part
└─md0 9:0 0 21.7T 0 raid0 /home
sdd 8:48 0 7.3T 0 disk
└─sdd1 8:49 0 7.3T 0 part /backup
The server was working fine for years after this configuration, until we restarted it recently, and to my surprise it was unable to mount the raid array. The sda disk seems to have some bad sectors but overall works, and the server boots, so I am not sure if it's the cause, I believe it's mostly a config error that was never detected since we never restarted after the initial setup ![]()
When I try mdadm --assemble --force /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sda5
I get
mdadm: /dev/sdb1 is busy - skipping mdadm: /dev/sdc1 is busy - skipping mdadm: no recogniseable superblock on /dev/sda5 mdadm: /dev/sda5 has no superblock - assembly aborted
Is there any chance to recover at least the file structure and paths, even partially, just some of them?
I have a backup from a couple of weeks before the incident, but just a day before the incident we did a lot of clean up and changed many things, so I was hoping to recover some of the file paths/filenames if that's possible. I don't need any of the files themselves, just filenames and paths if possible.
Thanks

Comments
Always do a reboot test. One of the important lesson I've learned in years of doing devops
As for recovery, try testdisk if you could somehow re-assemble the array to recover data or at least the layout
Hi,
your favorite search engine is your friend....
To help you, not enough information were provisioned.
Is the sda drive OK? Smart looks fine and all?
Content of /etc/mdadm/mdadm.conf? ( or where ever its located on your OS )
Output of mdadm --assemble --scan?
Interesting kernel messages during this all?
And so on and so on....
Running raid0 is always what it is. But if it needs to run like this, maybe filesystems with more resilence for logical errors ( like checksum feature ) should be used.
Restoring 20 TB of data is usually something that people will like to avoid, even you would have a backup from 1 second ago.
Learned that the hard way
Nothing
I just checked, I think this is relevant
Did you check the status of the raid before trying to force assemble it? What did mdadm -D /dev/md0 give?
Doesn't read like it did not find it, but rather it got corrupted. Trying to force anything on half assembled array probably makes things worse...
Stop the array, try to recover the superblocks for sda5 (ask google or chatgpt, I have never done that)
And only then try to reassemble it.
In general consider you data gone. Striping data across multiple disks aka Raid0 comes with exactly that risk of loosing it all on a single drives failure
tl;dr: At best I can create the array, but can't mount it, details:
At first we had the 3 drives, but after messing a bit with it, this is what we have now
yup, already had my little chat with chatgpt and tried a couple of things, it keeps suggesting to do
Which gives
at best I can get the raid array "started" then when I mount I get
And this situation when we recreate with --assume -clean and --force we have:
RAID0 ?! Complete reinstall.
interesting. especially the last mdadm details suggest that there is nothing wrong with the assembled array.
the fs on top however now seems borked and gives you that error on mount. did you run e2fsck on the assembled /dev/md0 yet? this would be the next step to try and get the filesystem repaired...
God, am I glad I didn't follow this advice...
@Falzo
I didn't know that utility. It solved my issue. It's back! and now I am backing up everything to a different server
Just tried it.
e2fsck /dev/md0It gave some
etc, it keeps giving these messages for a lot of different inodes
and then it gave
It took too long with inode 220043415, then it continued a lot of those "could be narrower" messages.
then it kept printing a massive amount of numbers and finally the messages ended with:
** then it finally worked**
I just did
mount -o ro /dev/md0 /mnt/tempmd0And I have my raid array back!!
when you suggested I try to re-assemble, I tried again, then followed your advice for e2fsck .
I can't thank you enough @Falzo. You have no idea how grateful I am.
Awesome work @Falzo
@afn so any upgrade to raid in future? Or YOLO with @Falzo for Raid 0
Well, raid 0 wasn't that bad... the human who set it up + the human who didn't verify nor restart the server ever + the low frequency of backups are the things that need an upgrade here
)
I will probably increase the backup sync frequency, and probably launch a resync manually every time I make major changes. And definitely I will reboot any new server after the initial setup (reminder this raid was created and mounted without ever restarting the server
Honestly, I had started to make my peace with losing the data, and I started re-sorting the backup and trying to figure out which files to keep from the backup. I had already given up and the backup was almost 99% of what we already had but it's just that many folders got cleaned up moved, renamed, sorted, etc after we took that old backup. What REALLY counts more than recovering a couple of files is the MASSIVE satisfaction you get from fixing something and for that I really can't thank @Falzo enough!
I was literally just one single command away from fixing everything, it was just that
e2fsckthat Falzo suggested that did the magic for me. It was very simple, but you need to know it exists. I never knew about it. Most people would just say "oh raid 0, forget it" and people on the internet love to lecture you with "maybe next time do more regular backups" like yeah thanks for constructive information. But you rarely find useful suggestions for recoveryThere is nothing more fun than pasting some random commands you don't fully understand in your terminal following recommendations from strangers on the internet, right? What can possibly go wrong (hint: a raid 0 array that gets messed up when you reboot your server)
i hear you
yes, i have also learned something new and will like to implement the reboot test to ensure nothing screws up later.
i am really glad folks like @Falzo are around here to share their expertise - however simple or complex the thing looks.
Sometimes, yes, one needs to make peace and move on. But, at the end of the road you suddenly see things coming back is just pure pleasure to experience.
Good luck!
That's a ticking bomb ether way...
Backup frequency should match how often the data changes. The lesson learned should have at least been daily backups are necessary.
And then you could have replaced the bad sector disk, make partitions and restore from backup and not worry about finding silent corruption down the road.
(I recommend backup with incremental and compression support).
Thanks for the advice. This is exactly what I am doing, I am not gonna rely on the data from that raid as their might be corruptions. I will use the backup and modify it as necessary.
I'd be tailing the log the whole time copying this thing over if I was going to rely on the data. having a corrupt media file is probably fine but having part of a db or binary corrupted would not be good. And you probably wont know it until it is too late. It appears to me you have a disk going out
@afn happy for you that this helped and you were able to get it back into a usable state.
as @jperkins mentioned, there is a good possibility that now some files are corrupted. e2fsck checks the filesystem and not files, this can be quite a difference. so when it does corrections, it will check superblocks, counters, descriptors, pointers but not the content of files.
if there are a lot of corrections like deleting duplicate blocks there will be borked files left behind with a very high probability.
in general I suggest to read up a bit on the different layers here, physical devices -> soft raid array -> filesystem -> data
will help with the next incident to have a better idea what to check first and in which order try repairing. sadly there is no easy recipe anyway, it's hard to recover and easy to brick it even more on the way.
TL;DR; again happy I could help, just look at your recovered files with a bit of doubt or try to verify them.
Thanks for the advice @jperkins and @Falzo.
I don't intend to use the "recovered files", nor that raid array again. I will only match the files list to the backup files list to identify the difference and restructure my backed up data accordingly. As for recent files missing from the backup, these will be re-uploaded via FTP manually from their original sources when possible, I am not relying at all on the files coming from the raid array. All my files exist somewhere else. Just the folder structures, sorting, etc was primarily what I needed to recover.
Edit: In fact, it's not just that some probability, I just confirmed that most of the files I recovered are corrupted, their checksum values no longer match the expected values. But that's totally fine, as my initial purpose was just to get a listing of folders and files.
In the end maybe really your raid0 got messed up. That sda5 was detected for having an ext2 superblock seems weird enough. It would at least be an explanation for the lots of broken files (essentially everything that lost some stripes).
If you don't really need raid0 for the performance gain, I'd suggest to have the disk just as single drives mounted or look into using LVM to combine but without striping.
Oh I just know that by default, when Lvm combining disk, it is not striping. Thank you!