New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
MegaRAID messed up - Help needed
leapswitch
Patron Provider, Veteran
in Help
Hello,
We have a storage server running 10 drives in a fake JBOD array -
http://ehaselwanter.com/en/blog/2012/11/26/MegaRaid-as-fake-JBOD-for-swift/
2 of these drives were configured as mdadm RAID-1 and mounted as / .
System was running fine for 7 months before it was rebooted today for kernel upgrade.
At first it couldn't find a bootable device, so we set first drive (used in the mdadm raid1) as bootable. However, it just showed a blinking cursor.
Trying to rescue the system shows - No linux found
Trying a fresh installation - No Usable disks found.
How can we make sure it finds the correct drives and array to boot from ? Any ideas ?
Comments
Are you able to get into the raid utility? - Check the drives and controller
sounds like your grub is broken, missing or gone for some reason.
I suggest booting the server with http://www.supergrubdisk.org/super-grub2-disk/ tell it to find any OS, this will boot your OS if found, then once your back in, install gurb.
if it does not find any OS and your other rescue systems did not find any OS it sounds like there is a piece of the puzzle missing here.
Drives and controller are showing as Optimal in RAID utlity.
Trying this, thank you
@AnthonySmith
http://imgur.com/a/eDh9b
Able to see the kernels using Super Grub2 , however all kernels showing Cannot find root device.
Maybe there is a failed drive?
No failed drive according to the RAID utility
Do you have an ability to run Rescue System (boot from ramfs or network) there?
Yes, we have physical access to the server.
Boot the rescue and have a look if you still have the partition table, and if the drives are mountable
If you can run rescue system with ssh availability over network, shoot me a pm; i can try to fix it asap.
right, so just update the target on boot to the other pysical drive that contains your root partition, sounds very much like you had a raid 1 array booting the server lets call them sda and sdb, and you only had grub on sda, sda as failed so you have lost both grub and your trying to boot from the same dead drive.
the solution is to point grub at the other pysical disk, hd0,1 i guess or what ever uuid is assigned to the second physical drive.
@leapswitch
Sort of related event happened with me 2 years back, while may not help you but you can try updating your BIOS to latest and then the MegaRaid controller's Firmware too.
Can be a compatibility problem with new Kernel
It is not even starting grub without a grub boot disk so I don't think that is the issue here.
Grub is missing, reinstall grub and specify the correct array to load, don't forget it's raid.
If you have a Logical volume/Volume Group(VG) then you lost only system file, You can install fresh OS using "custom partition" and do not remove LVM/VG. By this method new fresh system will be installed on your server and you will only lose data of system file/root.
Agreed, it may not help.
I was just giving a wild hint, as it happened to me on an IBM x3600 server a couple years ago. Would not boot anything at all, I just updated bios from the recovery console and it worked flawless...(in service till date). Updating or just Re-writing bios can be given a try.
An update, my team has been able to get it running in a rescue disk (not CentOS rescue, that is still not able to detect disks), and we are taking a backup of all data before continuing.
Good news. With what rescue disk it's been possible?
One of our techs had it available locally. I will put it up for download later .
CentOS 6 is unable to see any disks, but CentOS 7 sees all disks.
an update and a thank you for everyone who chimed in -
CentOS 7 install on the OS drives worked and we were able to import all disk safes back into R1Soft. Everything working well now.