Proxmox problem - pvedaemon died and cannot be restarted.
First of all unfortunately, when I tried to copy and paste the logs and other command output I must trigger some security of LowendTalk and my post was blocked so command output and logs I pasted on pastebin
I have some problem with proxmox and I kindly ask for your help. Yesterday I started a backup of the VMs and it stuck at some point, I think it was because my storage VPS at zxhost had some outage at the same time, so today I cancelled the backup through GUI, I simply clicked on stop button under backup detail window and the problem starts. The problem is that I cannot use Proxmox VE GUI any longer because it keeps displaying that my node is offline. but all VPSes and services on it works fine, I can also SSH to the node and on the VM.
firstly for clarify below is pveversion --verbose output:
After some time of searching solution on proxmox forum and debugging I noticed that pvedaemon process is down and I cannot run it again, when I try I get:
and the systemctl status pvedaemon.service gives me following:
so I assume that something is bind on pvedaemon port so I tried:
and then kill those PIDs listed above by kill -9 15218 and kill -9 15220 but no luck, those processes are still alive, I also tried pkill command with the same results.
I read somewhere on proxmox forum that pvedaemon runs on port 83 so I check it with:
and as you can see pvedaemon worker is listed there as well however when I check its status I get
[email protected]:~# pvedaemon status stopped
at this point I am lost, I tried also to restart pvestatd and pveproxy, and both restarted fine, although pveproxy need a long while. Can anyone help me?
if this make any difference, I have only 2 VPSes on this node and that VPSes are KVM based.
the pvedaemon workers are stuck in D state so you can't kill them and they won't go away and still block the port. just restart the whole server.
should I firstly shutdown the VMs? also, do you recommend to upgrade proxmox to version 5?
hmm yeah proper shutdown of the VMs can't be the worst idea. proxmox 5 isn't bad. yet I'd first see if you get your system up and running normally before messing around further.
from what I've seen so far upgrading process with proxmox works quite good. I have proxmox 5 running on a box already (but installed from scratch on top of debian) - yet can't tell of any obvious differences, besides kernel and stuff.
When this happens i start to use unorthodox methods. One of them is to rename the executable or move it. Restart and it wont be able to launch, then you go see what is on the socket.
It may be that it tries to start it from 2 locations. Ubuntu is a master at this after upgrades, they change locations and init scripts, at times both try to start. I do not keep tabs on where is proxmox putting their executables, but may also be the case. I do installations over debian only.
I'd guess the processes simply got stuck in D state (uninterruptable) after his storage died - depending on how it was mounted into the system, my guess would be NFS...
proxmox then most probably couldn't end the processes properly when the stopping of the backups was requested. hence the pvedaemon also got stuck blocking the port and preventing a simple restart...
good read on process states and especially uninterruptable ones: https://stackoverflow.com/questions/1475683/linux-process-states
What I can say, unstable NFS is problem with Proxmox. "Easy" fix is shutdown all VM's at CLI and reboot HW-node.
Yes, you are right, it is NFS share, so for the future can you recommend more stable metchod of backing up the VMs? Can I use rsync somehow maybe or WebDav? OK, I can reboot node once or twice but I can't be sure when backup dies next time, fortunately apart of entire VMs backup I also dome off-site Directadmin's account backups on 3 different locatnions so not such big issue in case of totally mess up, but it will be definitely a time-consuming process
Seriously, you can mount anything and nominate the directory.
Backup to local storage and then rsync it to external location? Should be more stable that NFS. I have a few Proxmox boxes and I have NFS-shares between them. Fortunately, connections have been very stable at last times, and I can't remember, when was last "crash" with backup. If you have monitor for your hw-node, usually you can see unrealistic load when NFS/Backup die, so you can react then.
I'd like to do as you suggest but I don't know how, I mean... is there some way to trigger some script after backup is complete something like backup_post.sh that will be running after all backup? Othervise I don't have idea when run rsync I am quite new to proxmox. I see something like:
#script: FILENAMEin /etc/vzdump.conf so I assume it can be that? however I cannot find any info about this setting and how this script should look like? could be that simple bash scritp that will contain list of commands to execute after backup or do I need to put some specific variables here?
like at @Maounique already said you can mount anything to a directory and point your storage to that dir, you could use sshfs for instance. of ourse an ssh connection might die during the backup-proccess too, so probably there is no guarantee on any of those methods. maybe test it on purpose to see what happens?
or you try glusterfs if you want to use it directly, I had no issue with that so far (yet I know underlying it's partially nfs too)
after all having a stale D state process left over may simply be bad luck.