All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
TX Hang Unit
Hello, guys...
I have a question.
My routing server has TX Hang. Every couple of days. It makes me CRAZY!
It doesn't solve. I also tried changing the NIC. This only happens in 1 port enp1s0f1. The other one, enp1s0f0, is fine. So it seems it's the drivers with my Debian 12.
DCTL.ENABLE for one or more queues not
cleared within the polling period
[260134.136672] ixgbe 0000:01:00.1: primary disable timed out
[260134.358982] ixgbe 0000:01:00.1 enp1s0f1: detected SFP+: 6
[260134.508466] ixgbe 0000:01:00.1 enp1s0f1: NIC Link is Up 10 Gbps, Flow Control:
RX/TX
[260134.612626] ixgbe 0000:01:00.1 enp1s0f1: Detected Tx Unit Hang
Tx Queue <8>
TDH, TDT <0>, <7>
next_to_use <7>
next_to_clean <0>
tx_buffer_info[next_to_clean]
time_stamp <103df3385>
jiffies <103df339f>
I've tried,
ethtool -K ethX tso gso gro rx-checksumming tx-checksumming scatter-gather tx-esp-segmentation tx-udp-segmentation tx-gso-partial tx-gre-segmentation tx-gre-csum-segmentation off tx-ipxip4-segmentation tx-ipxip6-segmentation tx-udp_tnl-segmentation tx-udp_tnl-csum-segmentation off
With no LUCK. I also tried changing transceivers. (It's a dual NIC), the affected port has a FIBER cable connected. The other one is a standard ethernet cable. So, for sure, it looks like driver issues.
I think the NIC is Intel® 82599ES, but
lspci -v says it's also a subsystem of X520.
I'm using the last Debian 12/Kernel.
I can't upgrade firmware due to kernel incompatibility.
Should I try another OS? Maybe Rocky Linux? Or use VMWare with a VM?
VMWare seems to have perfect support for these drivers.
https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=20208
Ty everyone. I will wait for your opinions.
Comments
Have you ruled out an intentional outage so law enforcement can generate SSL certificates for your hosts?
Looks like a popular issue: https://forum.proxmox.com/threads/ixgbe-driver-hang-up-detected-tx-unit-hang-tx-queue.120328/
UPD: there is some report that disabling LRO helps (I don't see lro amongs yours disabled offloads)
Thanks. I see that Intel sucks! lol.
I already tried with LRO, and almost all NIC stuff is OFF...
What is weird is it only happens on the Fiber port. The Ethernet port is always OK.
I will maybe get an X720 and replace it with it.
Try a Mellanox, dirt cheap on ebay