Server having packet loss after having more than 200 VMs on it?

Yakooza · November 2022

Hi,
We have an Epyc server with connect x 3 but we have a problem. After having more than like 200vms on it. It having packet loss. The network load on the server is not much and we use bridges for the KVM VMs.
Any thoughts?

Levi · November 2022

I can tell your fortune from coffee scraps or palm, but I can't tell you anything about server issues without proper suplementation of logs.

emgh · November 2022

Could be god.

jmgcaguicla · November 2022

@emgh said:
Could be god.

mrs92 · November 2022

@emgh said:
Could be god.
@jmgcaguicla said:

Maybe ... at the bottom of the sea now

tronyx · November 2022

Stick to the 199 VM's. Problem solved.

WebProject · November 2022

it depends how all these 200 VMs consume bandwith at the same time as you are limited with network port speed

Shakib · November 2022

200?

More than 20 VM on a single node makes me feel sick.

Get a good network card if you don't have one already. We had some issues on one node before when some clients was running bridges and interfaces of their own inside the VMs (aka Docker, VPN etc.). The card solved the stability and lagging issue.

Hxxx · November 2022

A true low-end provider. Love the fact that you have 200 VM's running in a single node.
Can you share more statistics or specs of the node? Really interested. Maybe run a jab test?

Yakooza · November 2022

Haha the node is dual Epyc 7542 with 2tb of RAM. Actually when we have over 100 VMs on it. It happens. More VMs more packet loss happens. We think the problem is the bridge and want to try open switch. Any thoughts?

dbContext · November 2022

When you say packet loss, are you referring to packet loss on the WAN, or LAN? If it's the WAN, have you ensured your uplink isn't congested? what network card do you have in the server?

Yakooza · November 2022

@dbContext said:
When you say packet loss, are you referring to packet loss on the WAN, or LAN? If it's the WAN, have you ensured your uplink isn't congested? what network card do you have in the server?

It's on the Lan, the uplink is good.
It's mellanox connect x 3 on 10g

Hxxx · November 2022

That's an amazing machine you got there. Beefy as F.
Anyway, haven't experienced that issue before, but if CPU, IO and RAM are all fine in terms of load and assuming you are not maximizing your port, then it might be the LAN card? Try using another card. Shakib also mentioned that.

@Francisco might have experience on this.

Yakooza · November 2022

I don't think it's the NIC card. There should be somewhere a limit. This happens when exactly it hits 100vms.

PieHasBeenEaten · November 2022

Well how many bridges you running?

chiezyy · November 2022

@tronyx said:
Stick to the 199 VM's. Problem solved.

"199 is the magic number"
takes note

jazzii · November 2022

Maybe some limit? open files, network interface limit?

Neoon · November 2022

@Hxxx said:
A true low-end provider. Love the fact that you have 200 VM's running in a single node.
Can you share more statistics or specs of the node? Really interested. Maybe run a jab test?

A true low-end-provider puts 500 of these on Raid 1 HDD's without cache.

Yakooza · November 2022

we narrowed down. Seems the problem is with ebtables. It slows down and can't filter these many packets at onces. We may want to try nftables. Any suggestions?

masiqbal · November 2022

what virtualization platform do you use?

rincewind · November 2022

You need a user-space TCP stack like DPDK. I think OpenVSwitch supports DPDK. Also, found this link https://support.mellanox.com/s/article/mellanox-dpdk

Yakooza · November 2022

@masiqbal said:
what virtualization platform do you use?

Qemu, on Virtualizor CP.

Howdy, Stranger!

Categories

In this Discussion

Server having packet loss after having more than 200 VMs on it?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Server having packet loss after having more than 200 VMs on it?

Comments