New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
I can tell your fortune from coffee scraps or palm, but I can't tell you anything about server issues without proper suplementation of logs.
Could be god.
Maybe ... at the bottom of the sea now
Stick to the 199 VM's. Problem solved.
it depends how all these 200 VMs consume bandwith at the same time as you are limited with network port speed
200?
More than 20 VM on a single node makes me feel sick.
Get a good network card if you don't have one already. We had some issues on one node before when some clients was running bridges and interfaces of their own inside the VMs (aka Docker, VPN etc.). The card solved the stability and lagging issue.
A true low-end provider. Love the fact that you have 200 VM's running in a single node.
Can you share more statistics or specs of the node? Really interested. Maybe run a jab test?
Haha the node is dual Epyc 7542 with 2tb of RAM. Actually when we have over 100 VMs on it. It happens. More VMs more packet loss happens. We think the problem is the bridge and want to try open switch. Any thoughts?
When you say packet loss, are you referring to packet loss on the WAN, or LAN? If it's the WAN, have you ensured your uplink isn't congested? what network card do you have in the server?
It's on the Lan, the uplink is good.
It's mellanox connect x 3 on 10g
That's an amazing machine you got there. Beefy as F.
Anyway, haven't experienced that issue before, but if CPU, IO and RAM are all fine in terms of load and assuming you are not maximizing your port, then it might be the LAN card? Try using another card. Shakib also mentioned that.
@Francisco might have experience on this.
I don't think it's the NIC card. There should be somewhere a limit. This happens when exactly it hits 100vms.
Well how many bridges you running?
"199 is the magic number"
takes note
Maybe some limit? open files, network interface limit?
A true low-end-provider puts 500 of these on Raid 1 HDD's without cache.
we narrowed down. Seems the problem is with ebtables. It slows down and can't filter these many packets at onces. We may want to try nftables. Any suggestions?
what virtualization platform do you use?
You need a user-space TCP stack like DPDK. I think OpenVSwitch supports DPDK. Also, found this link https://support.mellanox.com/s/article/mellanox-dpdk
Qemu, on Virtualizor CP.