Is this a french thing or just OVHCloud sheningans?

stefeman · September 2023

You have Proxmox 7.4 running on OVHCloud dedicated server.
Suddenly your server crashes one night and you get error on boot: Unable to locate IOAPIC for GSI -1
You boot to rescue, and check lsblk

Everything looks well.. You can mount/chroot the data and LVM with:

ls /dev/mapper/*

mount /dev/mapper/lvm-data /mnt
and the system with
mount /dev/mapper/controller /mnt

Okay, proceed to unmount. Can backup if needed.

Proceed to open a ticket:

Hi, my server crashed recently, and I get this error on boot: Unable to locate IOAPIC for GSI -1

I have checked that the both disks show up and mount normally in rescue mode.

I have important data in server, so Im only asking what this error is and how to solve it and boot back into current system.

The server is running default Proxmox VE 7.4 template.

OVHCloud asks permission to check the issue via ticket.

No issue with hardware.. Okay back to square one.. reboot back to rescue and start taking backups.

All partitions gone after OVH intervention.

Did they just nuke every fucking thing there was? When I especially asked them to just check it. Or was there some kind of French - English language barrier? Or am I just missing something here and all data is fine despite all partitions are now gone from lsblk?

stefeman · September 2023

I'm not even mad. This is so dumb its almost amazing. Question is, can this be salvaged? any advice is kindly appreciated lmao

stefeman · September 2023

Or maybe im just a fucking retard myself. lsmod shows empty response.

Something is wrong with this rescue mode.

Trying to enable mdraid, lvm, device-mapper with modprobe next.

Val · September 2023

This is an unmanaged dedicated server, they do not boot into your OS to check stuff. The most they'll do is check the hardware from rescue mode.

jackb · September 2023

pvscan
vgscan
lvscan
mdadm --assemble --scan

These are four you'll probably want look up if you're not already familiar with them.

stefeman · September 2023

They had outdated rescue image loaded without correct modules from the intervention. everything shows up again now.

I will do backups of the qcow2 disk image and then I try to figure out whats wrong with the boot error: Unable to locate IOAPIC for GSI -1

stefeman · September 2023

@jackb said:
pvscan
vgscan
lvscan
mdadm --assemble --scan

These are four you'll probably want look up if you're not already familiar with them.

Any ideas how to proceed? xD

jackb · September 2023

@stefeman said:
Any ideas how to proceed? xD

Have you found your root and boot partitions?

If so, mount root to /mnt and boot to /mnt/boot , then follow the below to chroot into your system.

https://superuser.com/a/417004

Then reinstall grub following the instructions for your particular OS - varies per distro typically.

stefeman · September 2023

mount /dev/md126 on /mnt

ls /mnt
bin dev home lib32 libx32 media opt root sbin sys usr
boot etc lib lib64 lost+found mnt proc run srv tmp var

stefeman · September 2023

cat /mnt/etc/fstab
UUID=766bffc2-b1f4-440c-8d38-525cc3f739cd / ext4 defaults 0 1
UUID=941fa5a7-979d-4486-bb9b-823f313c64f4 /boot ext4 defaults 0 0
LABEL=EFI_SYSPART /boot/efi vfat defaults 0 1
UUID=95d9af38-6faf-4b2d-99f9-46a3ce548de1 /var/lib/vz ext4 defaults 0 0
UUID=8d444ae4-9813-46ca-a816-3e387f5c09d7 swap swap defaults 0 0
UUID=682703b2-a54d-46ea-bf14-13905848b9ed swap swap defaults 0 0

stefeman · September 2023

mount /dev/md127 /mnt/boot
mount /dev/nvme1n1p1 /mnt/boot/efi

This should be it for mounting the boot partitions.

stefeman · September 2023

No errors so far:

mount -t proc /proc /mnt/proc/
mount --rbind /sys /mnt/sys/
mount --rbind /dev /mnt/dev/

Also no errors so far.

chroot /mnt

now im at chroot

Val · September 2023

Did you recently update/change the Linux kernel? It seems to be a kernel issue.

Maounique · September 2023

I would be very careful about tickets mentioning disks, if I were you. Might swap disks first, check them later, if that.
You might never know how they would react, the random is very high in those people.

stefeman · September 2023

@Val said:
Did you recently update/change the Linux kernel? It seems to be a kernel issue.

Possibly, if it was via normal update/upgrade how can I check this?

Also,

ls /etc/pve/

is empty..

ls /var/lib/pve-cluster/
config.db config.db-shm config.db-wal

jackb · September 2023

@stefeman said:
No errors so far:

mount -t proc /proc /mnt/proc/
mount --rbind /sys /mnt/sys/
mount --rbind /dev /mnt/dev/

Also no errors so far.

chroot /mnt

now im at chroot

Perfect. Reinstall grub and try a reboot.

stefeman · September 2023

@jackb said:

@stefeman said:
No errors so far:

mount -t proc /proc /mnt/proc/
mount --rbind /sys /mnt/sys/
mount --rbind /dev /mnt/dev/

Also no errors so far.

chroot /mnt

now im at chroot

Perfect. Reinstall grub and try a reboot.

root@rescue-customer-ca:/# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.15.108-1-pve
Found initrd image: /boot/initrd.img-5.15.108-1-pve
/usr/sbin/grub-probe: warning: Couldn't find physical volume (null)'. Some modules may be missing from core image.. /usr/sbin/grub-probe: warning: Couldn't find physical volume(null)'. Some modules may be missing from core image..
/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done

stefeman · September 2023

May you tell me the exact commands I could try?

I tried:

add "noapic" on the variable GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub to disable APIC

But I get the above stuff.

root@rescue-customer-ca:/# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.15.108-1-pve
Found initrd image: /boot/initrd.img-5.15.108-1-pve
/usr/sbin/grub-probe: warning: Couldn't find physical volume (null)'. Some modules may be missing from core image.. /usr/sbin/grub-probe: warning: Couldn't find physical volume(null)'. Some modules may be missing from core image..
/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done

stefeman · September 2023

-rw------- 1 root root 40K Sep 20 12:18 /var/lib/pve-cluster/config.db

I probly have my configs right here since /etc/pve is empty.

stefeman · September 2023

@stefeman said:
May you tell me the exact commands I could try?

I tried:

add "noapic" on the variable GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub to disable APIC

But I get the above stuff.

root@rescue-customer-ca:/# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.15.108-1-pve
Found initrd image: /boot/initrd.img-5.15.108-1-pve
/usr/sbin/grub-probe: warning: Couldn't find physical volume (null)'. Some modules may be missing from core image.. /usr/sbin/grub-probe: warning: Couldn't find physical volume(null)'. Some modules may be missing from core image..
/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done

I assume this works as intended.

Will try to reboot now.

Configs file should be automatically mounted by pve-cluster at boot

stefeman · September 2023

Currently boots this far and gets stuck:

The cursor dot at the bottom keeps flashing though, and I dont have RAID6 so I wonder why its saying that.

I also tried remove the noapic grub command, but the old no GSI error comes back.

stefeman · September 2023

@stefeman said:
mount /dev/md127 /mnt/boot
mount /dev/nvme1n1p1 /mnt/boot/efi

This should be it for mounting the boot partitions.

I wonder if the mount /dev/nvme1n1p1 /mnt/boot/efi command was something I was not supposed to apply as the /dev/md127 or /boot partition already contained efi folder inside.

jackb · September 2023

@stefeman said:

@jackb said:

@stefeman said:
No errors so far:

mount -t proc /proc /mnt/proc/
mount --rbind /sys /mnt/sys/
mount --rbind /dev /mnt/dev/

Also no errors so far.

chroot /mnt

now im at chroot

Perfect. Reinstall grub and try a reboot.

root@rescue-customer-ca:/# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.15.108-1-pve
Found initrd image: /boot/initrd.img-5.15.108-1-pve
/usr/sbin/grub-probe: warning: Couldn't find physical volume (null)'. Some modules may be missing from core image.. /usr/sbin/grub-probe: warning: Couldn't find physical volume(null)'. Some modules may be missing from core image..
/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..

I've got this error before from degraded/unsynced mdadm arrays. Check the contents of /proc/mdstat and see if your arrays are intact.

stefeman · September 2023

@jackb said:

@stefeman said:

@jackb said:

@stefeman said:
No errors so far:

mount -t proc /proc /mnt/proc/
mount --rbind /sys /mnt/sys/
mount --rbind /dev /mnt/dev/

Also no errors so far.

chroot /mnt

now im at chroot

Perfect. Reinstall grub and try a reboot.

root@rescue-customer-ca:/# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.15.108-1-pve
Found initrd image: /boot/initrd.img-5.15.108-1-pve
/usr/sbin/grub-probe: warning: Couldn't find physical volume (null)'. Some modules may be missing from core image.. /usr/sbin/grub-probe: warning: Couldn't find physical volume(null)'. Some modules may be missing from core image..
/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..

I've got this error before from degraded/unsynced mdadm arrays. Check the contents of /proc/mdstat and see if your arrays are intact.

root@rescue:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multi                                                   path] [faulty]
md126 : active raid1 nvme0n1p2[0]
      1046528 blocks super 1.2 [2/1] [U_]

md127 : active raid0 nvme0n1p3[0] nvme1n1p3[1]
      20953088 blocks super 1.2 512k chunks

jackb · September 2023

@stefeman said:
root@rescue:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multi path] [faulty]
md126 : active raid1 nvme0n1p2[0]
1046528 blocks super 1.2 [2/1] [U_]

md127 : active raid0 nvme0n1p3[0] nvme1n1p3[1]
20953088 blocks super 1.2 512k chunks

Sync the degraded arrays and then retry the grub reinstall from chroot as before.

stefeman · September 2023

@jackb said:

@stefeman said:
root@rescue:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multi path] [faulty]
md126 : active raid1 nvme0n1p2[0]
1046528 blocks super 1.2 [2/1] [U_]

md127 : active raid0 nvme0n1p3[0] nvme1n1p3[1]
20953088 blocks super 1.2 512k chunks

Sync the degraded arrays and then retry the grub reinstall from chroot as before.

Any idea how? xD

jackb · September 2023

@stefeman said:
Any idea how? xD

You need to find which missing partition belongs in /dev/md126. Then mdadm --manage /dev/md126 --add <partition>

I'd hazard a guess at nvme1n1p2, but you should confirm the size matches nvme0n1p2 / that the partition table of nvme0n1 matches nvme1n1.

Val · September 2023

Or just backup and reinstall?
Have you checked with smartctl the state of both disks as well?

jackb · September 2023

@Val said:
Or just backup and reinstall?
Have you checked with smartctl the state of both disks as well?

Honestly this sort of problem is recoverable most of the time. The procedure to recover it is worth learning for anyone using a bare metal system.

stefeman · September 2023

@jackb said:

@Val said:
Or just backup and reinstall?
Have you checked with smartctl the state of both disks as well?

Honestly this sort of problem is recoverable most of the time. The procedure to recover it is worth learning for anyone using a bare metal system.

I rebooted into recovery final time, seems like the lsblk is different every time.

cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md126 : active raid1 nvme0n1p2[0]
      1046528 blocks super 1.2 [2/1] [U_]

md127 : active raid0 nvme0n1p3[0] nvme1n1p3[1]
      20953088 blocks super 1.2 512k chunks

unused devices: <none>

Would mdadm --manage /dev/md126 --add <partition> still apply?

And thanks for the support to everyone so far.

stefeman · September 2023

Im fairly sure that right now, md126 is the boot and it needs to be on both drives.

Howdy, Stranger!

Categories

In this Discussion

Is this a french thing or just OVHCloud sheningans?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Is this a french thing or just OVHCloud sheningans?

Comments