Decrease Fail Over time in Proxmox HA

thisisjaymehta · November 2025

Hi,

I have 2 server nodes in a cluster in proxmox. And there is a test VM on it which has replication enabled along with HA.

So it replicates VM disks at set interval and whenever a node goes down, it moves its VM to another node and starts it.

But it is taking ~5 minutes. Which is slow in my opinion.

I have already tried using hardware watchdog instead of software watchdog which proxmox HA uses by default, but there is no improvement.

Both nodes are connected via LAN.

Any pointers/suggestions on how to speed up the Failover process?

mw · November 2025

/etc/pve/datacenter.cfg:
migration: type=insecure

hades_corps · November 2025

Create a blank VM on the 2nd node, set it to boot from the replicated disks, set your monitor to spin this VM up when the main one fails, AND stop this one when the main is online.

insert_rat_profile_pic_here

WyvernCo · November 2025

Proxmox docs do say to expect it to take a couple minutes:

ha-manager has typical error detection and failover times of about 2 minutes, so you can get no more than 99.999% availability.

But I don't know how to speed up from 5 minutes to 2 minutes.

thisisjaymehta · November 2025

@mw said:
/etc/pve/datacenter.cfg:
migration: type=insecure

This helped me save ~15-20 sec

thisisjaymehta · November 2025

@hades_corps said: Create a blank VM on the 2nd node, set it to boot from the replicated disks, set your monitor to spin this VM up when the main one fails, AND stop this one when the main is online.

I was hoping to get the proxmox's HA failover to be faster than writing my own script to monitor and start / stop VMs. But it's a last resort that I might do if I can not get proxmox's fail over to be fast enough.

onidel · November 2025

the cluster should take max 2 minutes to declare a node is offline and start recovering the VM, which will be additional 5-10s - but that is just starting the VM. If the VM is slow to boot up then it adds additional time.

is your 5 minutes the time it takes for the VM to be available in another node, or the time it takes for the VM to become online (boot up and responding to ping)?

thisisjaymehta · November 2025

I think it's working like that now, it is taking average ~2.5 min for VM to start responding to pings. Earlier it was taking more than 3-4 minutes.

@onidel said: the cluster should take max 2 minutes to declare a node is offline and start recovering the VM

But can we reduce time ever further that cluster takes to declare a node is offline?

onidel · November 2025

@thisisjaymehta said:
I think it's working like that now, it is taking average ~2.5 min for VM to start responding to pings. Earlier it was taking more than 3-4 minutes.

But can we reduce time ever further that cluster takes to declare a node is offline?

it is hard coded in Proxmox, you could do find and replace it you would like, however it will be more risky for shorter fence delay or watchdog timeout, because the node might be able to recover and you will have 2 VMs running at the same time.

TimboJones · December 2025

The solution is using shared storage at the expense of budget and complexity.

Howdy, Stranger!

Categories

In this Discussion

Decrease Fail Over time in Proxmox HA

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Decrease Fail Over time in Proxmox HA

Comments