$100 if you fix this Proxmox VM issue

nanankcornering · April 2025

@mw said: Feedback from the provider is that this seems to be a hardware compatibility problem. On Monday the NIC is being swapped and I'll update ya'll on what happens

curious to what NIC are you using?

also, have you also tried tcpdump-ing from vm side? do you see any traffic on it?

cybertech · April 2025

so who got paid?

JohnMiller92 · April 2025

@cybertech said:
so who got paid?

trump, wrong account

JohnMiller92 · April 2025

> The vendor confirmed the VM's IP shows up in the ARP table but it cannot ping anything, not even the gateway

> This works, the VM now has networking.

> Wish it was a valid solution

where the hell is dewlance and his magic carpet. wtf kind of jiggery-pokery is this 💀

hades_corps · April 2025

@JohnMiller92 said:

> The vendor confirmed the VM's IP shows up in the ARP table but it cannot ping anything, not even the gateway

> This works, the VM now has networking.

> Wish it was a valid solution

where the hell is dewlance and his magic carpet. wtf kind of jiggery-pokery is this 💀

I think OP want to setup this VM directly connect to vmbr0 so that this VM can have access to the entire IP with all the port!? As oppose to normally you would create another bridge then masquerate/SNAT for out going network, and nftable to forward the ports. It could save a few ms not going through firewall/forwarding/congestion control; and/or give VM user all the ports without pre-approval!? Did I got this right?

Though I think that @mw could try to add the other IPs to vmbr0 first before assign it to the VM.
auto vmbr0:1
iface vmbr0:1 inet static
address <IP2>/23

Falzo · April 2025

Though I think that @mw could try to add the other IPs to vmbr0 first before assign it to the VM.
auto vmbr0:1
iface vmbr0:1 inet static
address <IP2>/23

That's more or less the same as the routed solution I already suggested. It is a working fix indeed as requested, but OP decided it's not matching his untold definitions of 'working'.
so he ignores it and refuses to pay the bounty. Says a lot...

hades_corps · April 2025

@Falzo said:

Though I think that @mw could try to add the other IPs to vmbr0 first before assign it to the VM.
auto vmbr0:1
iface vmbr0:1 inet static
address <IP2>/23

That's more or less the same as the routed solution I already suggested. It is a working fix indeed as requested, but OP decided it's not matching his untold definitions of 'working'.
so he ignores it and refuses to pay the bounty. Says a lot...

I see. I was under the impression that create another bridge will give the VM external IP but all the ports still need to be forwarded, while vmbr0:x will give VM both.

tentor · April 2025

@hades_corps said: so that this VM can have access to the entire IP with all the port!?

Routed setup allows this. However, for any reason, OP needs link-layer to be visible from within virtual machine.

@hades_corps said: As oppose to normally you would create another bridge then masquerate/SNAT for out going network, and nftable to forward the ports.

This isn't routed setup, and I don't think it is normal usecase when you have dedicated public IPv4 address for each VM.

@hades_corps said: It could save a few ms not going through firewall/forwarding/congestion control;

Few ms is too much, forwarding in Linux doesn't take THAT much of time.

Rubben · April 2025

have you tried restarting it?

see? done. give me $100 bucks for luxembourg horse penis VPS

mw · April 2025

@nanankcornering said:

@mw said: Feedback from the provider is that this seems to be a hardware compatibility problem. On Monday the NIC is being swapped and I'll update ya'll on what happens

curious to what NIC are you using?

also, have you also tried tcpdump-ing from vm side? do you see any traffic on it?

I haven't had access to the server in days, but on Monday I'll ask the provider for the old and new ones so I know in future

mw · April 2025

Just wanted to give an update as I still don't have the box back from the provider: If you're sending me a PM to reach out for help, I appreciate it but won't be able to try your suggestions until its back

ehab · April 2025

@mw , did @Falzo get his money already?

what r u waiting for!

mw · April 2025

I just clocked I opened this thread on the 6th... I told the provider if there's no resolution by EOB tomorrow I don't want to continue waiting

mw · April 2025

provider was kind enough to give me a breakdown of whats been happening and i thought it was interesting enough to share:

we did the following things:

Diagnose driver issues nic
Update nic drivers
Attempt different proxmox versions
Attempt other virtualizing software (works fine)
Tested on network stack in lab
Swapped motherboard, cpu, chassis, psu, disks, riser
Ordered proxmox license and had contact in detail with them regarding this bug
Spoke to external proxmox expert and tried to troubleshoot/debug this with them
Contacted other contacts in the same market trying to diagnose the issue
Tested on different rack in datacenter
Tested on different motherboard brand
Tested with different cpu/memory generation
Attempted multiple os's within the vms

i'm surprised after all this it's still sorta just not working

given the above, and that it doesn't actually matter when this gets fixed anymore, perhaps i stick around just so i can share with everyone what the eventual fix ends up being.

it might help someone down the line with this very opaque problem

PineappleM · April 2025

They expended an extraordinary amount of resources to troubleshoot this, including a lot of money. Hats off to them for at least trying!

mw · April 2025

@PineappleM said:
They expended an extraordinary amount of resources to troubleshoot this, including a lot of money. Hats off to them for at least trying!

i just wonder whether this very specific config of hardware + their network stack + proxmox just doesnt get along since other virt software was said to work
(isnt proxmox just debian?)

very curious situation. i wonder if any provider has ever seen this before, or perhaps this is just a cursed order since this box has been replaced twice now and this is the third "version"

the first was an asrock board + 7950X that just died suddenly

the second a replacement supermicro board + 7950X that died, came back for a day, then died again

provider then offered a swap to supermicro + 9950X and thats how i ended up here

perhaps my AMD puts made Lisa Su condemn me

Falzo · April 2025

Did the same subnet worked in bridge networking on the mentioned boxes before or is it a new one?

All that fuss about trying to find that issue within the server seems... desperate. At this point the problem is 99.9% on the other end of the network cable. Maybe some typo in some config which makes it specific for that subnet or so.

And btw. that another hypervisor works does not mean anything... maybe that one uses routed instead of bridged.

mw · April 2025

@Falzo said:
Did the same subnet worked in bridge networking on the mentioned boxes before or is it a new one?

Yip, worked just fine, which is why I initially thought I was doing something wrong. In fact when we had it replaced, we kept our disks so we booted up in the same state as it was when it died, but networking was cooked which I initially thought was because of some corruption/similar. So we decided to just nuke and reinstall, which is when I opened this thread thinking someone could quickly spot the issue.

All that fuss about trying to find that issue within the server seems... desperate. At this point the problem is 99.9% on the other end of the network cable. Maybe some typo in some config which makes it specific for that subnet or so.

It doesn’t work on our prefix and the providers, and the provider has said that they tested this in their lab which I assume means a third prefix likely only used for testing, which must rule out the prefix configuration being the issue.

And btw. that another hypervisor works does not mean anything... maybe that one uses routed instead of bridged.

Could be, but the provider has emphasised that the issue appears to be specific to Proxmox:

it is an issue with how proxmox handles the hardware, in combination with the nic and firmware

Falzo · April 2025

If it would be a specific combination with the nic, why didn't it work with a different one? Shouldn't that rule out the nic and its drivers being the problem?

I agree if it worked before and nothing has been changed in the network setup on the prefixes, it's only understandable to look for the culprit in the server. But I find it always hard to believe that such issues should come from some very specific combo.

Guess the only way to be very sure would be to recreate an old system and verify that this would still be working...

..or just use routed networking and send me the money.

cmeerw · April 2025

@mw said:
provider was kind enough to give me a breakdown of whats been happening and i thought it was interesting enough to share:

we did the following things:

Diagnose driver issues nic
Update nic drivers
Attempt different proxmox versions
Attempt other virtualizing software (works fine)
Tested on network stack in lab
Swapped motherboard, cpu, chassis, psu, disks, riser
Ordered proxmox license and had contact in detail with them regarding this bug
Spoke to external proxmox expert and tried to troubleshoot/debug this with them
Contacted other contacts in the same market trying to diagnose the issue
Tested on different rack in datacenter
Tested on different motherboard brand
Tested with different cpu/memory generation
Attempted multiple os's within the vms

i'm surprised after all this it's still sorta just not working

given the above, and that it doesn't actually matter when this gets fixed anymore, perhaps i stick around just so i can share with everyone what the eventual fix ends up being.

it might help someone down the line with this very opaque problem

What I find very interesting here is that it's a lot of swap this/swap that with just works/doesn't work, but still no tcpdump (or similar) to check where the network packets get lost (I mean something like: does the correct ARP packet get sent? Is it visible on the host network interface? Is it visible on the bridge? Does a ARP reply get sent? How far does that get?)

mw · April 2025

@cmeerw said:

@mw said:
provider was kind enough to give me a breakdown of whats been happening and i thought it was interesting enough to share:

we did the following things:

Diagnose driver issues nic
Update nic drivers
Attempt different proxmox versions
Attempt other virtualizing software (works fine)
Tested on network stack in lab
Swapped motherboard, cpu, chassis, psu, disks, riser
Ordered proxmox license and had contact in detail with them regarding this bug
Spoke to external proxmox expert and tried to troubleshoot/debug this with them
Contacted other contacts in the same market trying to diagnose the issue
Tested on different rack in datacenter
Tested on different motherboard brand
Tested with different cpu/memory generation
Attempted multiple os's within the vms

i'm surprised after all this it's still sorta just not working

given the above, and that it doesn't actually matter when this gets fixed anymore, perhaps i stick around just so i can share with everyone what the eventual fix ends up being.

it might help someone down the line with this very opaque problem

What I find very interesting here is that it's a lot of swap this/swap that with just works/doesn't work, but still no tcpdump (or similar) to check where the network packets get lost (I mean something like: does the correct ARP packet get sent? Is it visible on the host network interface? Is it visible on the bridge? Does a ARP reply get sent? How far does that get?)

and

@Falzo said:
If it would be a specific combination with the nic, why didn't it work with a different one? Shouldn't that rule out the nic and its drivers being the problem?

I agree if it worked before and nothing has been changed in the network setup on the prefixes, it's only understandable to look for the culprit in the server. But I find it always hard to believe that such issues should come from some very specific combo.

Guess the only way to be very sure would be to recreate an old system and verify that this would still be working...

from the provider:

yes we're still having the same issue with another NIC, not exactly identical bit simular, it seems like outbound packets will go out, but inbound won't go in, I will provide a more detailed report tomorrow once I debugged that part further.

perhaps they’re just keeping details to layman terms and mentioning the technical specifics isn’t useful to me? not sure, i’m just forwarding as received

..or just use routed networking and send me the money.

i wish routed worked for us, it would have saved two weeks of this. ill ask the provider for an update in a few hours and let you guys know what they say

mw · April 2025

they essentially swapped every component of the server and tested with another NIC, with the help of industry professionals and Proxmox themselves, yet something is still wrong

Lisa Su, please forgive me…

xvps · April 2025

we did the following things:

Diagnose driver issues nic
Update nic drivers
Attempt different proxmox versions
Attempt other virtualizing software (works fine)
Tested on network stack in lab
Swapped motherboard, cpu, chassis, psu, disks, riser
Ordered proxmox license and had contact in detail with them regarding this bug
Spoke to external proxmox expert and tried to troubleshoot/debug this with them
Contacted other contacts in the same market trying to diagnose the issue
Tested on different rack in datacenter
Tested on different motherboard brand
Tested with different cpu/memory generation
Attempted multiple os's within the vms

i'm surprised after all this it's still sorta just not working

I'm not. 😄

My best guess is that the bug entered the system during installation, and the tech guys didn’t reinstall the OS each time they replaced a hardware component.

Have you tried something as simple as testing with MemTest86?

mw · April 2025

@xvps said:

we did the following things:

Diagnose driver issues nic
Update nic drivers
Attempt different proxmox versions
Attempt other virtualizing software (works fine)
Tested on network stack in lab
Swapped motherboard, cpu, chassis, psu, disks, riser
Ordered proxmox license and had contact in detail with them regarding this bug
Spoke to external proxmox expert and tried to troubleshoot/debug this with them
Contacted other contacts in the same market trying to diagnose the issue
Tested on different rack in datacenter
Tested on different motherboard brand
Tested with different cpu/memory generation
Attempted multiple os's within the vms

i'm surprised after all this it's still sorta just not working

I'm not. 😄

My best guess is that the bug entered the system during installation, and the tech guys didn’t reinstall the OS each time they replaced a hardware component.

>

i did 6-7 reinstalls myself, and im p sure the provider also reinstalled since they mentioned testing on different disks and different hypervisors

Have you tried something as simple as testing with MemTest86?

testing what exactly? all components appear to have been swapped

Falzo · April 2025

Is there any hardware environment where this is working as expected with proxmox? Like the one you had before.
I am still not convinced that in between nothing changed on the providers end. Like some updates/upgrades in their routers/switches whatever...

ehab · April 2025

@mw wheres Falzo's money?

xvps · April 2025

@mw said:

@xvps said:

we did the following things:

Diagnose driver issues nic
Update nic drivers
Attempt different proxmox versions
Attempt other virtualizing software (works fine)
Tested on network stack in lab
Swapped motherboard, cpu, chassis, psu, disks, riser
Ordered proxmox license and had contact in detail with them regarding this bug
Spoke to external proxmox expert and tried to troubleshoot/debug this with them
Contacted other contacts in the same market trying to diagnose the issue
Tested on different rack in datacenter
Tested on different motherboard brand
Tested with different cpu/memory generation
Attempted multiple os's within the vms

i'm surprised after all this it's still sorta just not working

I'm not. 😄

My best guess is that the bug entered the system during installation, and the tech guys didn’t reinstall the OS each time they replaced a hardware component.

>

i did 6-7 reinstalls myself, and im p sure the provider also reinstalled since they mentioned testing on different disks and different hypervisors

Have you tried something as simple as testing with MemTest86?

testing what exactly? all components appear to have been swapped

If I’m right, it really doesn’t matter how many times you reinstall on the same hardware — the result will be the same.

The crucial difference is whether the provider reinstalled the OS each time they replaced a hardware component, rather than just booting the system to see if it worked. If not, they could easily miss a faulty component.

MemTest86 is an efficient way to test memory for faults, and faulty memory can cause weird errors like this.

But feel free to believe that everything has been tested properly and that there’s some kind of magic behind the issue.

mw · April 2025

@Falzo said:
Is there any hardware environment where this is working as expected with proxmox? Like the one you had before.
I am still not convinced that in between nothing changed on the providers end. Like some updates/upgrades in their routers/switches whatever...

Before, yes the 7950X before it died

Now, yes the other two dozen dedis running Proxmox in the same rack using the same prefix

mw · April 2025

@xvps said:

@mw said:

@xvps said:

we did the following things:

Diagnose driver issues nic
Update nic drivers
Attempt different proxmox versions
Attempt other virtualizing software (works fine)
Tested on network stack in lab
Swapped motherboard, cpu, chassis, psu, disks, riser
Ordered proxmox license and had contact in detail with them regarding this bug
Spoke to external proxmox expert and tried to troubleshoot/debug this with them
Contacted other contacts in the same market trying to diagnose the issue
Tested on different rack in datacenter
Tested on different motherboard brand
Tested with different cpu/memory generation
Attempted multiple os's within the vms

i'm surprised after all this it's still sorta just not working

I'm not. 😄

My best guess is that the bug entered the system during installation, and the tech guys didn’t reinstall the OS each time they replaced a hardware component.

>

i did 6-7 reinstalls myself, and im p sure the provider also reinstalled since they mentioned testing on different disks and different hypervisors

Have you tried something as simple as testing with MemTest86?

testing what exactly? all components appear to have been swapped

But feel free to believe that everything has been tested properly and that there’s some kind of magic behind the issue.

who pissed in your oatmeal...? relax

Falzo · April 2025

@mw said:

@Falzo said:
Is there any hardware environment where this is working as expected with proxmox? Like the one you had before.
I am still not convinced that in between nothing changed on the providers end. Like some updates/upgrades in their routers/switches whatever...

Before, yes the 7950X before it died

Now, yes the other two dozen dedis running Proxmox in the same rack using the same prefix

Interesting. Time to get creative and clone one of the working proxmox servers then (expecting them to not be at the all latest version for kernel and co).

Howdy, Stranger!

Categories

In this Discussion

$100 if you fix this Proxmox VM issue

Comments

> The vendor confirmed the VM's IP shows up in the ARP table but it cannot ping anything, not even the gateway

> This works, the VM now has networking.

> Wish it was a valid solution

> The vendor confirmed the VM's IP shows up in the ARP table but it cannot ping anything, not even the gateway

> This works, the VM now has networking.

> Wish it was a valid solution

Howdy, Stranger!

Quick Links

Categories

In this Discussion

$100 if you fix this Proxmox VM issue

Comments

> The vendor confirmed the VM's IP shows up in the ARP table but it cannot ping anything, not even the gateway

> This works, the VM now has networking.

> Wish it was a valid solution

> The vendor confirmed the VM's IP shows up in the ARP table but it cannot ping anything, not even the gateway

> This works, the VM now has networking.

> Wish it was a valid solution