Linux Network Performance Tuning

socialzzz · February 2023

Hi is there a formula or calculator for tuning the below based on round trip ms?

net.core.rmem_max =
net.core.wmem_max =
net.ipv4.tcp_rmem =
net.ipv4.tcp_wmem =
net.ipv4.tcp_window_scaling = 1

stoned · February 2023

I've not seen a ready made calculator.

Parameters core/rmem_default and core/wmem_default are the default receive and send tcp buffer sizes, while core/rmem_max and core/wmem_max are the maximum receive and send buffer sizes that can be set using setsockopt(), in bytes.

Parameters ipv4/tcp_rmem and ipv4/tcp_wmem are the amount of memory in bytes for read (receive) and write (transmit) buffers per open socket. Each contains three numbers: the minimum, default, and maximum values.

Parameter tcp_mem is the amount of memory in 4096-byte pages totaled across all TCP applications. It contains three numbers: the minimum, pressure, and maximum. The pressure is the threshold at which TCP will start to reclaim buffer memory to move memory use down toward the minimum. You want to avoid hitting that threshold.

Increase the default and maximum for tcp_rmem and tcp_wmem on servers and clients when they are on either a 10 Gbps LAN with latency under 1 millisecond, or communicating over high-latency low-speed WANs. In those cases their TCP buffers may fill and limit throughput, because the TCP window size can't be made large enough to handle the delay in receiving ACK's from the other end.

Then, for tcp_mem, set it to twice the maximum value for tcp_[rw]mem` multiplied by the maximum number of running network applications divided by 4096 bytes per page.

Increase rmem_max and wmem_max so they are at least as large as the third values of tcp_rmem and tcp_wmem.

Calculate the bandwidth delay product, the total amount of data in transit on the wire, as the product of the bandwidth in bytes per second multiplied by the round-trip delay time in seconds. A 1 Gbps LAN with 2 millisecond round-trip delay means 125 Mbytes per second times 0.002 seconds or 250 kbytes.

If you don't have buffers this large on the hosts, senders have to stop sending and wait for an acknowledgement, meaning that the network pipe isn't kept full and we're not using the full bandwidth. Increase the buffer sizes as the bandwidth delay product increases. However, be careful. The bandwidth delay product is the ideal, although you can't really measure how it fluctuates. If you provide buffers significantly larger than the bandwidth delay product for connections outbound from your network edge, you are just contributing to buffer congestion across the Internet without making things any faster for yourself.

# The following are suggested on IBM's
# High Performance Computing page
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 87380 16777216

# This server might have 200 clients simultaneously, so:
#   max(tcp_wmem) * 2 * 200 / 4096
net.ipv4.tcp_mem = 1638400 1638400 1638400

# Disable TCP SACK (TCP Selective Acknowledgement),
# DSACK (duplicate TCP SACK), and FACK (Forward Acknowledgement)
net.ipv4.tcp_sack = 0
net.ipv4.tcp_dsack = 0
net.ipv4.tcp_fack = 0

# Disable the gradual speed increase that's useful
# on variable-speed WANs but not for us
net.ipv4.tcp_slow_start_after_idle = 0

The benefits of TCP window scaling seem clear and you might be tempted to enable it everywhere. However, there are circumstances where TCP window scaling will actually cause more problems than it solves, especially when the network is unreliable or slow.

In this case, it’s better to have a smaller window; otherwise, there will be too many retransmissions, and the performance of the TCP connection can drop dramatically—to the point where the vast majority of the traffic consists of retransmissions. Given that the network is slow and/or unreliable to begin with, you can understand that this situation will be detrimental to the performance of the TCP connection.

What exactly are you trying to achieve? What is the quality of your network?

Sources: https://cromwell-intl.com/
Sources: https://dl.acm.org/doi/abs/10.1145/1823844.1823848 - An argument for increasing TCP's initial congestion window
Sources: https://www.site24x7.com/learn/linux/tcp-window-scaling.html

rm_ · February 2023

Just make sure you set bbr and don't bother about the rest.

net.ipv4.tcp_congestion_control=bbr

Unless you are trying to achieve the impossible with some really crappy conditions (such as China), the defaults should be fine.

stoned · February 2023

https://cloud.google.com/blog/products/networking/tcp-bbr-congestion-control-comes-to-gcp-your-internet-just-got-faster

Levi · February 2023

For network stuff you should select proper os: BSD. Than just bbr it.

socialzzz · February 2023

@stoned said:
I've not seen a ready made calculator.

Parameters core/rmem_default and core/wmem_default are the default receive and send tcp buffer sizes, while core/rmem_max and core/wmem_max are the maximum receive and send buffer sizes that can be set using setsockopt(), in bytes.

Parameters ipv4/tcp_rmem and ipv4/tcp_wmem are the amount of memory in bytes for read (receive) and write (transmit) buffers per open socket. Each contains three numbers: the minimum, default, and maximum values.

Parameter tcp_mem is the amount of memory in 4096-byte pages totaled across all TCP applications. It contains three numbers: the minimum, pressure, and maximum. The pressure is the threshold at which TCP will start to reclaim buffer memory to move memory use down toward the minimum. You want to avoid hitting that threshold.

Increase the default and maximum for tcp_rmem and tcp_wmem on servers and clients when they are on either a 10 Gbps LAN with latency under 1 millisecond, or communicating over high-latency low-speed WANs. In those cases their TCP buffers may fill and limit throughput, because the TCP window size can't be made large enough to handle the delay in receiving ACK's from the other end.

Then, for tcp_mem, set it to twice the maximum value for tcp_[rw]mem` multiplied by the maximum number of running network applications divided by 4096 bytes per page.

Increase rmem_max and wmem_max so they are at least as large as the third values of tcp_rmem and tcp_wmem.

Calculate the bandwidth delay product, the total amount of data in transit on the wire, as the product of the bandwidth in bytes per second multiplied by the round-trip delay time in seconds. A 1 Gbps LAN with 2 millisecond round-trip delay means 125 Mbytes per second times 0.002 seconds or 250 kbytes.

If you don't have buffers this large on the hosts, senders have to stop sending and wait for an acknowledgement, meaning that the network pipe isn't kept full and we're not using the full bandwidth. Increase the buffer sizes as the bandwidth delay product increases. However, be careful. The bandwidth delay product is the ideal, although you can't really measure how it fluctuates. If you provide buffers significantly larger than the bandwidth delay product for connections outbound from your network edge, you are just contributing to buffer congestion across the Internet without making things any faster for yourself.
# The following are suggested on IBM's
# High Performance Computing page
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 87380 16777216

# This server might have 200 clients simultaneously, so:
#   max(tcp_wmem) * 2 * 200 / 4096
net.ipv4.tcp_mem = 1638400 1638400 1638400

# Disable TCP SACK (TCP Selective Acknowledgement),
# DSACK (duplicate TCP SACK), and FACK (Forward Acknowledgement)
net.ipv4.tcp_sack = 0
net.ipv4.tcp_dsack = 0
net.ipv4.tcp_fack = 0

# Disable the gradual speed increase that's useful
# on variable-speed WANs but not for us
net.ipv4.tcp_slow_start_after_idle = 0 
The benefits of TCP window scaling seem clear and you might be tempted to enable it everywhere. However, there are circumstances where TCP window scaling will actually cause more problems than it solves, especially when the network is unreliable or slow.

In this case, it’s better to have a smaller window; otherwise, there will be too many retransmissions, and the performance of the TCP connection can drop dramatically—to the point where the vast majority of the traffic consists of retransmissions. Given that the network is slow and/or unreliable to begin with, you can understand that this situation will be detrimental to the performance of the TCP connection.

What exactly are you trying to achieve? What is the quality of your network?

Sources: https://cromwell-intl.com/
Sources: https://dl.acm.org/doi/abs/10.1145/1823844.1823848 - An argument for increasing TCP's initial congestion window
Sources: https://www.site24x7.com/learn/linux/tcp-window-scaling.html

Currently using @RansomIT network from Sydney with a dedicated server via a third party and the routes are horrible so the latency has increased 200ms to LA and was getting 12MBps they gave my host the following values

net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_window_scaling = 1

however although I was able to saturate the 1gbps link to the one test provider everything else is slow that is close in RTT.

Howdy, Stranger!

Categories

In this Discussion

Linux Network Performance Tuning

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Linux Network Performance Tuning

Comments