Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

kvm host on almalinux 9 dropped packets

Hello!

I have kvm host node on almalinux 9 with virtualizor.

At random moments, packet loss begins. Traffic can be either 30 Mbit or 150 Mbit. The problem does not seem to be in the amount of traffic.

I updated the kernel to the latest from elrepo

I updated the driver from https://github.com/intel/ethernet-linux-i40e
I tried changing the settings, now:

net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1

fs.file-max = 65536
net.netfilter.nf_conntrack_max = 1048576
net.nf_conntrack_max = 1048576

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

net.core.netdev_max_backlog = 250000
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 131072 16777216
net.ipv4.tcp_fastopen = 3

net.core.netdev_budget = 25000
net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_max_tw_buckets = 600000
net.ipv4.tcp_max_syn_backlog = 600000
net.ipv4.tcp_sack = 0

An irqbalance was assembled from source code (version 1.9.0).

Changed txqueuelen for all interfaces. Increased to 10000. MTU changed to 5000

ethtool -i eno1

driver: i40e
version: 2.26.8
firmware-version: 3.31 0x80000cd9 1.1747.0
expansion-rom-version:
bus-info: 0000:60:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

ifconfig

eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 5000
        ether 3c:ec:ef:a0:f8:9c  txqueuelen 10000  (Ethernet)
        RX packets 26187939764  bytes 21613611194137 (19.6 TiB)
        RX errors 1619921  dropped 2347665748  overruns 0  frame 0
        TX packets 23537096937  bytes 20723141882148 (18.8 TiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ifconfig vps as an example

viifv1193: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::fc16:3eff:feef:50ee  prefixlen 64  scopeid 0x20<link>
        ether fe:16:3e:ef:50:ee  txqueuelen 10000  (Ethernet)
        RX packets 2410349  bytes 388340121 (370.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1692786688  bytes 118112058934 (110.0 GiB)
        TX errors 0  dropped 1418299 overruns 0  carrier 0  collisions 0

dropwatch -l kas

5724 drops at __init_scratch_end+1119d3b2 (0xffffffffc0b9d3b2) [software]
12 drops at __init_scratch_end+1119d3b2 (0xffffffffc0b9d3b2) [software]
1 drops at ip6_mc_input+270 (0xffffffffac4b34f0) [software]
11 drops at __init_scratch_end+1119d3b2 (0xffffffffc0b9d3b2) [software]
1 drops at ip6_mc_input+270 (0xffffffffac4b34f0) [software]
12 drops at __init_scratch_end+1119d3b2 (0xffffffffc0b9d3b2) [software]
1 drops at ip6_mc_input+270 (0xffffffffac4b34f0) [software]
24 drops at __init_scratch_end+1119d3b2 (0xffffffffc0b9d3b2) [software]
2 drops at ip6_mc_input+270 (0xffffffffac4b34f0) [software]
12 drops at __init_scratch_end+1119d3b2 (0xffffffffc0b9d3b2) [software]
1 drops at ip6_mc_input+270 (0xffffffffac4b34f0) [software]
12 drops at __init_scratch_end+1119d3b2 (0xffffffffc0b9d3b2) [software]
1 drops at ip6_mc_input+270 (0xffffffffac4b34f0) [software]
36 drops at __init_scratch_end+1119d3b2 (0xffffffffc0b9d3b2) [software]
1 drops at ip6_mc_input+270 (0xffffffffac4b34f0) [software]
2 drops at ip_rcv_finish_core.constprop.0+1d7 (0xffffffffac3f9f17) [software]
12 drops at __init_scratch_end+1119d3b2 (0xffffffffc0b9d3b2) [software]
1 drops at ip6_mc_input+270 (0xffffffffac4b34f0) [software]
2 drops at tcp_v4_rcv+80 (0xffffffffac430c80) [software]
13 drops at __init_scratch_end+1119d3b2 (0xffffffffc0b9d3b2) [software]
1 drops at ip6_mc_input+270 (0xffffffffac4b34f0) [software]
12 drops at __init_scratch_end+1119d3b2 (0xffffffffc0b9d3b2) [software]
1 drops at ip6_mc_input+270 (0xffffffffac4b34f0) [software]
12 drops at __init_scratch_end+1119d3b2 (0xffffffffc0b9d3b2) [software]
1 drops at ip6_mc_input+270 (0xffffffffac4b34f0) [software]
13 drops at __init_scratch_end+1119d3b2 (0xffffffffc0b9d3b2) [software]
1 drops at ip6_mc_input+270 (0xffffffffac4b34f0) [software]

eu-addr2line -f -k 0xffffffffc0b9d3b2
tun_net_xmit

in dmesg and messages - no error. Only "HTB: quantum of class 10001 is big. Consider r2q change."

I can't find a solution to packet loss. Maybe someone has encountered this problem and can suggest a solution?

Comments

  • tentortentor Member, Host Rep
    edited October 2024

    @Netify said: RX errors 1619921

    This should not happen at all. Can you show ethtool -S eno1?

    rx_errors

    Total number of bad packets received on this network device. This counter must include events counted by rx_length_errors, rx_crc_errors, rx_frame_errors and other errors not otherwise counted.
    
    Thanked by 1emgh
  • ethtool -S eno1:

    NIC statistics:
         rx_packets: 26205850611
         tx_packets: 23553964510
         rx_bytes: 21629429891407
         tx_bytes: 20739691723156
         rx_errors: 1620466
         tx_errors: 0
         rx_dropped: 2351373149
         tx_dropped: 0
         collisions: 0
         rx_length_errors: 0
         rx_crc_errors: 0
         rx_unicast: 24648285503
         tx_unicast: 23553009332
         rx_multicast: 93357327
         tx_multicast: 761492
         rx_broadcast: 3815579106
         tx_broadcast: 194424
         rx_unknown_protocol: 0
         tx_linearize: 0
         tx_force_wb: 306291
         tx_busy: 0
         tx_stopped: 11810
         rx_alloc_fail: 0
         rx_pg_alloc_fail: 0
         rx_cache_reuse: 78590569232
         tx-0.packets: 330155907
         tx-0.bytes: 290108487439
         rx-0.packets: 367900097
         rx-0.bytes: 300674570329
         rx-0.xdp.pass: 0
         rx-0.xdp.drop: 0
         rx-0.xdp.tx: 0
         rx-0.xdp.unknown: 0
         rx-0.xdp.redirect: 0
         rx-0.xdp.redirect_fail: 0
         tx-1.packets: 308497665
         tx-1.bytes: 270639527599
         rx-1.packets: 302286670
         rx-1.bytes: 269126523155
         rx-1.xdp.pass: 0
         rx-1.xdp.drop: 0
         rx-1.xdp.tx: 0
         rx-1.xdp.unknown: 0
         rx-1.xdp.redirect: 0
         rx-1.xdp.redirect_fail: 0
         tx-2.packets: 325785358
         tx-2.bytes: 289565400435
         rx-2.packets: 312357771
         rx-2.bytes: 269494815006
         rx-2.xdp.pass: 0
         rx-2.xdp.drop: 0
         rx-2.xdp.tx: 0
         rx-2.xdp.unknown: 0
         rx-2.xdp.redirect: 0
         rx-2.xdp.redirect_fail: 0
    
    

    or full https://pastebin.com/Ap7buUek

  • tentortentor Member, Host Rep

    @Netify see this:

    port.rx_csum_bad: 1620466
    

    I guess the issue is at the physical layer. Please check if the port of both NIC and switch are fine, as well as cable.

    Thanked by 2emgh Frameworks
  • emghemgh Member, Megathread Squad
    edited October 2024

    @tentor our guru

    Thanked by 2Frameworks tentor
  • @tentor said: I guess the issue is at the physical layer. Please check if the port of both NIC and switch are fine, as well as cable.

    We changed the cable and port. But it did not affect the problem.

  • vsys_hostvsys_host Member, Patron Provider

    @Netify said:

    @tentor said: I guess the issue is at the physical layer. Please check if the port of both NIC and switch are fine, as well as cable.

    We changed the cable and port. But it did not affect the problem.

    After changing port/cable, is rx_errors still increasing, or only rx_dropped increasing during packetloss?

  • @vsys_host said: After changing port/cable, is rx_errors still increasing, or only rx_dropped increasing during packetloss?

    Unfortunately, I can't say for sure. It seems that both values ​​increased.

    I rebooted the server yesterday to default almalinux kernel (not from elrepo).

    and no error on interfaces:

    eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 5000
            RX packets 1364580517  bytes 974542368902 (907.6 GiB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 1039997351  bytes 1045932782764 (974.1 GiB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    
    
    viifbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 5000
            RX packets 334674207  bytes 18290964723 (17.0 GiB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 12816505  bytes 944801110 (901.0 MiB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    

    but error on vps

    viifv1185: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 5000
            ether fe:16:3e:0f:93:26  txqueuelen 10000  (Ethernet)
            RX packets 1824967  bytes 1758177039 (1.6 GiB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 324532263  bytes 20464029390 (19.0 GiB)
    !!!        TX errors 0  dropped 24443 overruns 0  carrier 0  collisions 0
    

    when i ping host node or VPS, packet loss is observed:

    5 packets transmitted, 2 received, 60% packet loss, time 5001ms
    rtt min/avg/max/mdev = 35.636/35.663/35.691/0.190 ms
    
  • vsys_hostvsys_host Member, Patron Provider

    @Netify said:

    @vsys_host said: After changing port/cable, is rx_errors still increasing, or only rx_dropped increasing during packetloss?

    Unfortunately, I can't say for sure. It seems that both values ​​increased.

    I rebooted the server yesterday to default almalinux kernel (not from elrepo).

    and no error on interfaces:

    eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 5000
            RX packets 1364580517  bytes 974542368902 (907.6 GiB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 1039997351  bytes 1045932782764 (974.1 GiB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    
    
    viifbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 5000
            RX packets 334674207  bytes 18290964723 (17.0 GiB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 12816505  bytes 944801110 (901.0 MiB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    

    but error on vps

    viifv1185: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 5000
            ether fe:16:3e:0f:93:26  txqueuelen 10000  (Ethernet)
            RX packets 1824967  bytes 1758177039 (1.6 GiB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 324532263  bytes 20464029390 (19.0 GiB)
    !!!        TX errors 0  dropped 24443 overruns 0  carrier 0  collisions 0
    

    when i ping host node or VPS, packet loss is observed:

    5 packets transmitted, 2 received, 60% packet loss, time 5001ms
    rtt min/avg/max/mdev = 35.636/35.663/35.691/0.190 ms
    

    So, it does not look like a hardware issue at the moment. So the basic checks on both virtual and host machines are (during packet loss): every separate core of CPU is not loaded more than 70% and SI (soft interrupts) not more than 50%, conntrack table is not full, disable conntrack (temporary if possible), set same MTU on all interfaces, disable offload on NICs.

  • At the moment of packet loss, some cores are loaded at 100% (2-4 out of 88)
    LA - 15-25. Sometimes 30.

    disable offload on NICs - done. But this also had no effect.

    I set "ethtool -L eno1 combined 30" from 88. now it's a little better but it doesn't solve the problem. Packet loss is still growing.

    What else to try?

Sign In or Register to comment.