Need good ideas in analyzing L7 network slowness with L3/L4 being ok.

OhJohn · February 15

So I have this backend server that is serving several frontend/edge servers and for some reason a trivial HEAD health check takes like 2 seconds round trip for most edge servers. But e.g. iperf tests or ping tests (L3) are ok, also mtr tcp tests on L4.

From the same DC the same health check (from a different server in the DC) on L7 is like 25ms.

How do I find out why the L7 is so slow in comparison to L3/L4?

OhJohn · February 15

Ok, so I brought the rtt down from >2 seconds to 54ms by using a vpn between edge and backend.

Looks like if the DC heavily rate limits one (standard) port on their network....

Hosting_b2b · February 15

@OhJohn said:
Ok, so I brought the rtt down from >2 seconds to 54ms by using a vpn between edge and backend.

Looks like if the DC heavily rate limits one (standard) port on their network....

Hey! I guess, that the problem clearly isn't with the server, but with the network equipment between your points, which is slowing down standard web traffic. Since latency is eliminated through a VPN, this means that filters or data center security systems (DPI) are "slowing down" packets on standard ports 80 and 443.

Most likely, the network is forcibly inspecting the contents of your requests, which is creating an extra two seconds of latency.

OhJohn · February 15

@Hosting_b2b said: this means that filters or data center security systems (DPI) are "slowing down" packets on standard ports 80 and 443.

Most likely, the network is forcibly inspecting the contents of your requests, which is creating an extra two seconds of latency.

Yepp, this is what I'm guessing at the moment as well (though I never touched switches/routers etc. that are not small home devices).

It's atm not the port you mentioned by a typical replacement port (of those ports) where I see this.

If they start rate limiting/inspecting my vpn port as well I'm out

OhJohn · February 15

only think I don't get is that a mtr -P xxx -T xxx.xxx.xxx.xxx is not showing the problem.

Hosting_b2b · February 15

@OhJohn said:

@Hosting_b2b said: this means that filters or data center security systems (DPI) are "slowing down" packets on standard ports 80 and 443.

Most likely, the network is forcibly inspecting the contents of your requests, which is creating an extra two seconds of latency.

Yepp, this is what I'm guessing at the moment as well (though I never touched switches/routers etc. that are not small home devices).

It's atm not the port you mentioned by a typical replacement port (of those ports) where I see this.

If they start rate limiting/inspecting my vpn port as well I'm out

That makes sense If switching to a different common port doesn't change anything, it really looks like some kind of traffic limitation or inspection on their end, not just a port issue. The fact that a VPN solves the problem is a pretty strong indication that direct traffic is being handled differently. And yes... if they start limiting the VPN as well, then something is clearly happening at the network level, and you'll probably need to talk to your ISP.

Hosting_b2b · February 15

@OhJohn said:
only think I don't get is that a mtr -P xxx -T xxx.xxx.xxx.xxx is not showing the problem.

mtr -T -P only checks that the connection to the port opens quickly.

It doesn't check the HTTP request itself. Therefore, the connection may be fine, but the slowdown may only appear after the connection is established—when data transfer begins (This is the kinda first conclusion, but it needs to be checked further)/

OpaqueRegistrant · February 15

You can check an mtr on the HTTP port and the VPN port. If you get different routes, but consistent for each port, it's an almost sure sign of connection tampering. Not all criminal providers are so careless to let it be seen in a traceroute, so the same route isn't proof of no tampering.

yoursunny · February 15

Assuming your application is running on TCP and iperf3 TCP seems fine, a difference between HEAD request and iperf3 TCP is that the HEAD request needs new TCP connections and TLS handshake, while iperf3 measures persistent throughput after TCP connection is established.

If your firewall or DDoS protection is aggressive or sensitive, the TCP connection establishment for HEAD request can face difficulty, but has less impact on existing connections used by iperf3 TCP.
If your CPU is weak, the TLS handshake for HEAD request can take long time, but it’s not involved in iperf3 TCP.
If the network has lower MTU than what’s set on server and client, the PMTU discovery or fragmentation on the TCP connection for HEAD request can cause slowdowns, but has less impact on existing connections used by iperf3 TCP.

OhJohn · February 15

@yoursunny actually there is no TLS involved, it's just plain http HEAD request. Also those HEAD requests are fast (like 10 to 15ms) when send from a different server in the same DC on the common port. And they are fast when run through vpn from elsewhere, so it's not the server having problems.

@OpaqueRegistrant checked with mtr -P (port) -T with both ports, both take the same route.

So for me it really looks like @Hosting_b2b described: some kind of rate limiting or other kind of tinkering on the network for that common port by the DC or upstreams (the latter unlikely).

cmeerw · February 15

@OhJohn said:
So for me it really looks like @Hosting_b2b described: some kind of rate limiting or other kind of tinkering on the network for that common port by the DC or upstreams (the latter unlikely).

Just use tcpdump on both ends then to figure out what's going on.

OhJohn · February 15

Ups, there is another thing I forgot. The vpn is using udp, so that might be the diff as well while the normal http is normal tcp.

OhJohn · February 15

@cmeerw said: Just

I have to admit that this "just" is just my problem here. Why I surely can do the tcpdump (I would do a tcpdump -tulpen | grep xxx.xxx.xxx.xxx) and see the packets, how do I read out where the problem is from there? I not a big fan of reading tcpdump reports but would love to learn...

yoursunny · February 15

@OhJohn said:
I have to admit that this "just" is just my problem here. Why I surely can do the tcpdump (I would do a tcpdump -tulpen | grep xxx.xxx.xxx.xxx) and see the packets, how do I read out where the problem is from there? I not a big fan of reading tcpdump reports but would love to learn...

tcpdump -pni IFNAME -w 1.pcap "port 80"

Open the file in Wireshark, look for timestamps, packet losses, retransmissions, MSS adjustments, ICMP errors related to the flow, "expert info" on TCP flow, etc.

cmeerw · February 15

@OhJohn said:

@cmeerw said: Just

I have to admit that this "just" is just my problem here. Why I surely can do the tcpdump (I would do a tcpdump -tulpen | grep xxx.xxx.xxx.xxx) and see the packets, how do I read out where the problem is from there? I not a big fan of reading tcpdump reports but would love to learn...

I'd do something like tcpdump -npi intf host xxx.xxx.xxx.xxx and port yyyy (so you only get the relevant packets, maybe also add a -v) on both sides, and then basically just compare the two sides. Mainly look out for missing packets (and re-sends).

Howdy, Stranger!

Categories

In this Discussion

Need good ideas in analyzing L7 network slowness with L3/L4 being ok.

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Need good ideas in analyzing L7 network slowness with L3/L4 being ok.

Comments