Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


$100 if you fix this Proxmox VM issue - Page 4
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

$100 if you fix this Proxmox VM issue

124

Comments

  • @mw said: Feedback from the provider is that this seems to be a hardware compatibility problem. On Monday the NIC is being swapped and I'll update ya'll on what happens

    curious to what NIC are you using?

    also, have you also tried tcpdump-ing from vm side? do you see any traffic on it?

  • so who got paid?

  • @cybertech said:
    so who got paid?

    trump, wrong account

  • JohnMiller92JohnMiller92 Member
    edited April 13

    > The vendor confirmed the VM's IP shows up in the ARP table but it cannot ping anything, not even the gateway

    > This works, the VM now has networking.

    > Wish it was a valid solution

    where the hell is dewlance and his magic carpet. wtf kind of jiggery-pokery is this 💀

  • hades_corpshades_corps Member
    edited April 13

    @JohnMiller92 said:

    > The vendor confirmed the VM's IP shows up in the ARP table but it cannot ping anything, not even the gateway

    > This works, the VM now has networking.

    > Wish it was a valid solution

    where the hell is dewlance and his magic carpet. wtf kind of jiggery-pokery is this 💀

    I think OP want to setup this VM directly connect to vmbr0 so that this VM can have access to the entire IP with all the port!? As oppose to normally you would create another bridge then masquerate/SNAT for out going network, and nftable to forward the ports. It could save a few ms not going through firewall/forwarding/congestion control; and/or give VM user all the ports without pre-approval!? Did I got this right?

    Though I think that @mw could try to add the other IPs to vmbr0 first before assign it to the VM.
    auto vmbr0:1
    iface vmbr0:1 inet static
    address <IP2>/23

  • FalzoFalzo Member

    Though I think that @mw could try to add the other IPs to vmbr0 first before assign it to the VM.
    auto vmbr0:1
    iface vmbr0:1 inet static
    address <IP2>/23

    That's more or less the same as the routed solution I already suggested. It is a working fix indeed as requested, but OP decided it's not matching his untold definitions of 'working'.
    so he ignores it and refuses to pay the bounty. Says a lot...

  • @Falzo said:

    Though I think that @mw could try to add the other IPs to vmbr0 first before assign it to the VM.
    auto vmbr0:1
    iface vmbr0:1 inet static
    address <IP2>/23

    That's more or less the same as the routed solution I already suggested. It is a working fix indeed as requested, but OP decided it's not matching his untold definitions of 'working'.
    so he ignores it and refuses to pay the bounty. Says a lot...

    I see. I was under the impression that create another bridge will give the VM external IP but all the ports still need to be forwarded, while vmbr0:x will give VM both.

  • tentortentor Member, Host Rep

    @hades_corps said: so that this VM can have access to the entire IP with all the port!?

    Routed setup allows this. However, for any reason, OP needs link-layer to be visible from within virtual machine.

    @hades_corps said: As oppose to normally you would create another bridge then masquerate/SNAT for out going network, and nftable to forward the ports.

    This isn't routed setup, and I don't think it is normal usecase when you have dedicated public IPv4 address for each VM.

    @hades_corps said: It could save a few ms not going through firewall/forwarding/congestion control;

    Few ms is too much, forwarding in Linux doesn't take THAT much of time.

    Thanked by 1hades_corps
  • RubbenRubben Member

    have you tried restarting it?

    see? done. give me $100 bucks for luxembourg horse penis VPS

    Thanked by 2sillycat beanman109
  • mwmw Member

    @nanankcornering said:

    @mw said: Feedback from the provider is that this seems to be a hardware compatibility problem. On Monday the NIC is being swapped and I'll update ya'll on what happens

    curious to what NIC are you using?

    also, have you also tried tcpdump-ing from vm side? do you see any traffic on it?

    I haven't had access to the server in days, but on Monday I'll ask the provider for the old and new ones so I know in future

  • mwmw Member

    Just wanted to give an update as I still don't have the box back from the provider: If you're sending me a PM to reach out for help, I appreciate it but won't be able to try your suggestions until its back

  • ehabehab Member

    @mw , did @Falzo get his money already?

    what r u waiting for!

  • mwmw Member

    I just clocked I opened this thread on the 6th... I told the provider if there's no resolution by EOB tomorrow I don't want to continue waiting

  • mwmw Member
    edited April 17

    provider was kind enough to give me a breakdown of whats been happening and i thought it was interesting enough to share:

    we did the following things:

    Diagnose driver issues nic
    Update nic drivers
    Attempt different proxmox versions
    Attempt other virtualizing software (works fine)
    Tested on network stack in lab
    Swapped motherboard, cpu, chassis, psu, disks, riser
    Ordered proxmox license and had contact in detail with them regarding this bug
    Spoke to external proxmox expert and tried to troubleshoot/debug this with them
    Contacted other contacts in the same market trying to diagnose the issue
    Tested on different rack in datacenter
    Tested on different motherboard brand
    Tested with different cpu/memory generation
    Attempted multiple os's within the vms

    i'm surprised after all this it's still sorta just not working

    given the above, and that it doesn't actually matter when this gets fixed anymore, perhaps i stick around just so i can share with everyone what the eventual fix ends up being.

    it might help someone down the line with this very opaque problem

  • PineappleMPineappleM Member
    edited April 18

    They expended an extraordinary amount of resources to troubleshoot this, including a lot of money. Hats off to them for at least trying!

    Thanked by 1mrTom
  • mwmw Member

    @PineappleM said:
    They expended an extraordinary amount of resources to troubleshoot this, including a lot of money. Hats off to them for at least trying!

    i just wonder whether this very specific config of hardware + their network stack + proxmox just doesnt get along since other virt software was said to work
    (isnt proxmox just debian?)

    very curious situation. i wonder if any provider has ever seen this before, or perhaps this is just a cursed order since this box has been replaced twice now and this is the third "version"

    the first was an asrock board + 7950X that just died suddenly

    the second a replacement supermicro board + 7950X that died, came back for a day, then died again

    provider then offered a swap to supermicro + 9950X and thats how i ended up here

    perhaps my AMD puts made Lisa Su condemn me

  • FalzoFalzo Member

    Did the same subnet worked in bridge networking on the mentioned boxes before or is it a new one?

    All that fuss about trying to find that issue within the server seems... desperate. At this point the problem is 99.9% on the other end of the network cable. Maybe some typo in some config which makes it specific for that subnet or so.

    And btw. that another hypervisor works does not mean anything... maybe that one uses routed instead of bridged.

  • mwmw Member

    @Falzo said:
    Did the same subnet worked in bridge networking on the mentioned boxes before or is it a new one?

    Yip, worked just fine, which is why I initially thought I was doing something wrong. In fact when we had it replaced, we kept our disks so we booted up in the same state as it was when it died, but networking was cooked which I initially thought was because of some corruption/similar. So we decided to just nuke and reinstall, which is when I opened this thread thinking someone could quickly spot the issue.

    All that fuss about trying to find that issue within the server seems... desperate. At this point the problem is 99.9% on the other end of the network cable. Maybe some typo in some config which makes it specific for that subnet or so.

    It doesn’t work on our prefix and the providers, and the provider has said that they tested this in their lab which I assume means a third prefix likely only used for testing, which must rule out the prefix configuration being the issue.

    And btw. that another hypervisor works does not mean anything... maybe that one uses routed instead of bridged.

    Could be, but the provider has emphasised that the issue appears to be specific to Proxmox:

    it is an issue with how proxmox handles the hardware, in combination with the nic and firmware

  • FalzoFalzo Member

    If it would be a specific combination with the nic, why didn't it work with a different one? Shouldn't that rule out the nic and its drivers being the problem?

    I agree if it worked before and nothing has been changed in the network setup on the prefixes, it's only understandable to look for the culprit in the server. But I find it always hard to believe that such issues should come from some very specific combo.

    Guess the only way to be very sure would be to recreate an old system and verify that this would still be working...

    ..or just use routed networking and send me the money.

    Thanked by 1ehab
  • cmeerwcmeerw Member

    @mw said:
    provider was kind enough to give me a breakdown of whats been happening and i thought it was interesting enough to share:

    we did the following things:

    Diagnose driver issues nic
    Update nic drivers
    Attempt different proxmox versions
    Attempt other virtualizing software (works fine)
    Tested on network stack in lab
    Swapped motherboard, cpu, chassis, psu, disks, riser
    Ordered proxmox license and had contact in detail with them regarding this bug
    Spoke to external proxmox expert and tried to troubleshoot/debug this with them
    Contacted other contacts in the same market trying to diagnose the issue
    Tested on different rack in datacenter
    Tested on different motherboard brand
    Tested with different cpu/memory generation
    Attempted multiple os's within the vms

    i'm surprised after all this it's still sorta just not working

    given the above, and that it doesn't actually matter when this gets fixed anymore, perhaps i stick around just so i can share with everyone what the eventual fix ends up being.

    it might help someone down the line with this very opaque problem

    What I find very interesting here is that it's a lot of swap this/swap that with just works/doesn't work, but still no tcpdump (or similar) to check where the network packets get lost (I mean something like: does the correct ARP packet get sent? Is it visible on the host network interface? Is it visible on the bridge? Does a ARP reply get sent? How far does that get?)

  • mwmw Member
    edited April 18

    @cmeerw said:

    @mw said:
    provider was kind enough to give me a breakdown of whats been happening and i thought it was interesting enough to share:

    we did the following things:

    Diagnose driver issues nic
    Update nic drivers
    Attempt different proxmox versions
    Attempt other virtualizing software (works fine)
    Tested on network stack in lab
    Swapped motherboard, cpu, chassis, psu, disks, riser
    Ordered proxmox license and had contact in detail with them regarding this bug
    Spoke to external proxmox expert and tried to troubleshoot/debug this with them
    Contacted other contacts in the same market trying to diagnose the issue
    Tested on different rack in datacenter
    Tested on different motherboard brand
    Tested with different cpu/memory generation
    Attempted multiple os's within the vms

    i'm surprised after all this it's still sorta just not working

    given the above, and that it doesn't actually matter when this gets fixed anymore, perhaps i stick around just so i can share with everyone what the eventual fix ends up being.

    it might help someone down the line with this very opaque problem

    What I find very interesting here is that it's a lot of swap this/swap that with just works/doesn't work, but still no tcpdump (or similar) to check where the network packets get lost (I mean something like: does the correct ARP packet get sent? Is it visible on the host network interface? Is it visible on the bridge? Does a ARP reply get sent? How far does that get?)

    and

    @Falzo said:
    If it would be a specific combination with the nic, why didn't it work with a different one? Shouldn't that rule out the nic and its drivers being the problem?

    I agree if it worked before and nothing has been changed in the network setup on the prefixes, it's only understandable to look for the culprit in the server. But I find it always hard to believe that such issues should come from some very specific combo.

    Guess the only way to be very sure would be to recreate an old system and verify that this would still be working...

    from the provider:

    yes we're still having the same issue with another NIC, not exactly identical bit simular, it seems like outbound packets will go out, but inbound won't go in, I will provide a more detailed report tomorrow once I debugged that part further.

    perhaps they’re just keeping details to layman terms and mentioning the technical specifics isn’t useful to me? not sure, i’m just forwarding as received

    ..or just use routed networking and send me the money.

    i wish routed worked for us, it would have saved two weeks of this. ill ask the provider for an update in a few hours and let you guys know what they say

  • mwmw Member

    they essentially swapped every component of the server and tested with another NIC, with the help of industry professionals and Proxmox themselves, yet something is still wrong

    Lisa Su, please forgive me…

  • xvpsxvps Member

    we did the following things:

    Diagnose driver issues nic
    Update nic drivers
    Attempt different proxmox versions
    Attempt other virtualizing software (works fine)
    Tested on network stack in lab
    Swapped motherboard, cpu, chassis, psu, disks, riser
    Ordered proxmox license and had contact in detail with them regarding this bug
    Spoke to external proxmox expert and tried to troubleshoot/debug this with them
    Contacted other contacts in the same market trying to diagnose the issue
    Tested on different rack in datacenter
    Tested on different motherboard brand
    Tested with different cpu/memory generation
    Attempted multiple os's within the vms

    i'm surprised after all this it's still sorta just not working

    I'm not. 😄

    My best guess is that the bug entered the system during installation, and the tech guys didn’t reinstall the OS each time they replaced a hardware component.

    Have you tried something as simple as testing with MemTest86?

  • mwmw Member

    @xvps said:

    we did the following things:

    Diagnose driver issues nic
    Update nic drivers
    Attempt different proxmox versions
    Attempt other virtualizing software (works fine)
    Tested on network stack in lab
    Swapped motherboard, cpu, chassis, psu, disks, riser
    Ordered proxmox license and had contact in detail with them regarding this bug
    Spoke to external proxmox expert and tried to troubleshoot/debug this with them
    Contacted other contacts in the same market trying to diagnose the issue
    Tested on different rack in datacenter
    Tested on different motherboard brand
    Tested with different cpu/memory generation
    Attempted multiple os's within the vms

    i'm surprised after all this it's still sorta just not working

    I'm not. 😄

    My best guess is that the bug entered the system during installation, and the tech guys didn’t reinstall the OS each time they replaced a hardware component.

    >

    i did 6-7 reinstalls myself, and im p sure the provider also reinstalled since they mentioned testing on different disks and different hypervisors

    Have you tried something as simple as testing with MemTest86?

    testing what exactly? all components appear to have been swapped

  • FalzoFalzo Member

    Is there any hardware environment where this is working as expected with proxmox? Like the one you had before.
    I am still not convinced that in between nothing changed on the providers end. Like some updates/upgrades in their routers/switches whatever...

  • ehabehab Member

    @mw wheres Falzo's money?

    Thanked by 1mw
  • xvpsxvps Member

    @mw said:

    @xvps said:

    we did the following things:

    Diagnose driver issues nic
    Update nic drivers
    Attempt different proxmox versions
    Attempt other virtualizing software (works fine)
    Tested on network stack in lab
    Swapped motherboard, cpu, chassis, psu, disks, riser
    Ordered proxmox license and had contact in detail with them regarding this bug
    Spoke to external proxmox expert and tried to troubleshoot/debug this with them
    Contacted other contacts in the same market trying to diagnose the issue
    Tested on different rack in datacenter
    Tested on different motherboard brand
    Tested with different cpu/memory generation
    Attempted multiple os's within the vms

    i'm surprised after all this it's still sorta just not working

    I'm not. 😄

    My best guess is that the bug entered the system during installation, and the tech guys didn’t reinstall the OS each time they replaced a hardware component.

    >

    i did 6-7 reinstalls myself, and im p sure the provider also reinstalled since they mentioned testing on different disks and different hypervisors

    Have you tried something as simple as testing with MemTest86?

    testing what exactly? all components appear to have been swapped

    If I’m right, it really doesn’t matter how many times you reinstall on the same hardware — the result will be the same.

    The crucial difference is whether the provider reinstalled the OS each time they replaced a hardware component, rather than just booting the system to see if it worked. If not, they could easily miss a faulty component.

    MemTest86 is an efficient way to test memory for faults, and faulty memory can cause weird errors like this.

    But feel free to believe that everything has been tested properly and that there’s some kind of magic behind the issue.

  • mwmw Member

    @Falzo said:
    Is there any hardware environment where this is working as expected with proxmox? Like the one you had before.
    I am still not convinced that in between nothing changed on the providers end. Like some updates/upgrades in their routers/switches whatever...

    Before, yes the 7950X before it died

    Now, yes the other two dozen dedis running Proxmox in the same rack using the same prefix

    Thanked by 1Falzo
  • mwmw Member

    @xvps said:

    @mw said:

    @xvps said:

    we did the following things:

    Diagnose driver issues nic
    Update nic drivers
    Attempt different proxmox versions
    Attempt other virtualizing software (works fine)
    Tested on network stack in lab
    Swapped motherboard, cpu, chassis, psu, disks, riser
    Ordered proxmox license and had contact in detail with them regarding this bug
    Spoke to external proxmox expert and tried to troubleshoot/debug this with them
    Contacted other contacts in the same market trying to diagnose the issue
    Tested on different rack in datacenter
    Tested on different motherboard brand
    Tested with different cpu/memory generation
    Attempted multiple os's within the vms

    i'm surprised after all this it's still sorta just not working

    I'm not. 😄

    My best guess is that the bug entered the system during installation, and the tech guys didn’t reinstall the OS each time they replaced a hardware component.

    >

    i did 6-7 reinstalls myself, and im p sure the provider also reinstalled since they mentioned testing on different disks and different hypervisors

    Have you tried something as simple as testing with MemTest86?

    testing what exactly? all components appear to have been swapped

    But feel free to believe that everything has been tested properly and that there’s some kind of magic behind the issue.

    who pissed in your oatmeal...? relax

  • FalzoFalzo Member

    @mw said:

    @Falzo said:
    Is there any hardware environment where this is working as expected with proxmox? Like the one you had before.
    I am still not convinced that in between nothing changed on the providers end. Like some updates/upgrades in their routers/switches whatever...

    Before, yes the 7950X before it died

    Now, yes the other two dozen dedis running Proxmox in the same rack using the same prefix

    Interesting. Time to get creative and clone one of the working proxmox servers then (expecting them to not be at the all latest version for kernel and co).

Sign In or Register to comment.