Hurricane Electric - is this packet loss normal?

proofofsteak · March 8

Migrating data between a US server and a Hetzner Germany server and noticed the connection dropping every couple gigabytes or so.

Ran a few MTRs and saw this, seems there is massive packet loss on data routed through HE?

Not a networking/IP guy at all, but surely this can't be normal, any idea what is going on? Is this normal HE performance?

Included a Cogent route at the bottom for comparison, no loss.

HE Route #1 [USA Utah > Hetzner DE, big loss)

HOST: xx62                                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- original.server.location                0.0%   200    1.1   1.1   0.6   8.5   0.8
  2.|-- core-100ge0-8-1-5.slc01.fiberstate.com  0.0%   200    2.1   1.4   0.3  26.3   2.9
  3.|-- e0-2.switch1.slc4.he.net               63.0%   200    1.4   3.3   1.1  30.2   4.1
  4.|-- port-channel9.core2.den1.he.net        92.0%   200   13.1  15.8  12.8  41.5   7.1
  5.|-- port-channel8.core2.oma1.he.net        79.0%   200   23.4  24.1  22.8  39.6   2.7
  6.|-- 100ge0-69.core2.chi1.he.net            83.5%   200   32.2  33.9  32.0  44.9   3.6
  7.|-- port-channel13.core3.chi1.he.net       89.5%   200   31.2  40.8  31.2  90.2  14.0
  8.|-- port-channel1.core2.nyc4.he.net        86.0%   200   61.2  50.8  47.9  61.4   4.2
  9.|-- port-channel20.core3.lon2.he.net       93.0%   200  114.9 119.6 114.9 142.9   8.6
 10.|-- port-channel4.core1.ams1.he.net        11.5%   200  120.3 122.5 119.5 199.1   8.7
 11.|-- ???                                    100.0   200    0.0   0.0   0.0   0.0   0.0
 12.|-- core5.fra.hetzner.com                   0.0%   200  143.9 141.8 141.3 147.2   0.5
 13.|-- core24.fsn1.hetzner.com                 0.0%   200  156.4 151.6 151.2 156.4   0.5
 14.|-- ex9k1.dc14.fsn1.hetzner.com             0.0%   200  153.4 152.2 151.3 167.2   2.1
 15.|-- final.destination.fsnl.hetzner.com      0.0%   200  162.2 151.9 151.3 162.2   0.8

HE Route #2 [USA Utah > Hetzner DE, big loss)

HOST: xx62                                      Loss%   Snt   Last   Avg  Best                                                                               Wrst StDev
  1.|-- original.server.location                   0.0%   200    1.0   1.4   0.7                                                                               23.1   2.2
  2.|-- core-100ge0-8-1-5.slc01.fiberstate.com     0.0%   200    0.7   1.6   0.3                                                                               37.6   4.0
  3.|-- e0-2.switch1.slc4.he.net                  58.5%   200    1.4   3.9   1.4                                                                               37.6   5.0
  4.|-- port-channel9.core2.den1.he.net           91.0%   200   12.9  17.8  12.8                                                                               47.0  10.4
  5.|-- port-channel8.core2.oma1.he.net           77.5%   200   23.9  23.8  22.7                                                                               33.5   1.8
  6.|-- 100ge0-69.core2.chi1.he.net               87.0%   200   32.6  32.9  31.9                                                                               38.8   1.4
  7.|-- port-channel13.core3.chi1.he.net          92.5%   200   31.2  34.2  31.0                                                                               49.4   4.9
  8.|-- port-channel4.core3.nyc4.he.net           88.5%   200   50.8  57.0  48.0                                                                               87.9  11.3
  9.|-- port-channel20.core3.lon2.he.net          85.5%   200  129.1 128.6 114.7                                                                              251.0  28.2
 10.|-- port-channel4.core1.ams1.he.net            4.0%   200  119.9 122.6 119.4                                                                              191.3   8.6
 11.|-- ???                                       100.0   200    0.0   0.0   0.0                                                                                0.0   0.0
 12.|-- core5.fra.hetzner.com                      0.0%   200  138.2 138.3 137.8                                                                              142.9   0.5
 13.|-- core11.nbg1.hetzner.com                    0.0%   200  138.4 138.9 138.0                                                                              167.3   2.8
 14.|-- spine16.cloud1.nbg1.hetzner.com           34.0%   200  1403. 1187. 928.8                                                                              1476.  97.1
 15.|-- spine2.cloud1.nbg1.hetzner.com             0.0%   200  139.0 142.7 138.4                                                                              246.7  11.8
 16.|-- ???                                       100.0   200    0.0   0.0   0.0                                                                                0.0   0.0
 17.|-- 13102.your-cloud.host                      0.0%   200  140.5 140.5 140.1                                                                              150.4   0.8
 18.|-- static.some.ip.here.clients.your-server.de 0.0%   200  138.2 138.4 138.0

HE Route #1 [USA Utah > Romania, big loss)

HOST: xx62                                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- original.server.ip                      0.0%   200    1.3   1.1   0.7   7.9   0.6
  2.|-- core-100ge0-8-1-5.slc01.fiberstate.com  0.0%   200    0.7   1.4   0.3  34.2   3.8
  3.|-- e0-2.switch1.slc4.he.net               63.5%   200    2.1   3.8   1.4  22.5   4.3
  4.|-- port-channel9.core2.den1.he.net        92.0%   200   14.1  13.0  11.8  21.4   2.6
  5.|-- port-channel8.core2.oma1.he.net        79.5%   200   22.0  23.9  21.7  35.2   3.1
  6.|-- 100ge0-69.core2.chi1.he.net            81.5%   200   47.3  33.7  30.9  59.9   5.5
  7.|-- port-channel13.core3.chi1.he.net       94.0%   200   60.0  41.8  30.0  60.0  10.0
  8.|-- port-channel1.core2.nyc4.he.net        84.5%   200   47.4  50.9  46.8  71.5   6.7
  9.|-- port-channel20.core3.lon2.he.net       86.5%   200  123.3 121.1 113.5 158.1  11.0
 10.|-- ???                                    100.0   200    0.0   0.0   0.0   0.0   0.0
 11.|-- 10.0.240.146                            0.0%   200  156.8 157.1 156.5 166.8   1.2
 12.|-- 10.0.240.214                            0.0%   200  154.1 154.4 153.4 168.3   1.7
 13.|-- 10.0.245.74                             0.0%   200  156.2 156.5 155.6 170.7   1.7
 14.|-- 92.80.113.70                            0.0%   200  156.2 157.8 155.9 177.2   2.6
 15.|-- maybe.calins.basement                   0.0%   200  156.6 157.6 155.7 170.4   2.1

Cogent #1 [USA Utah > Scaleway FR, no loss)

HOST: xx62                                            Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- original.server.ip                               0.0%   200    1.1   1.2   0.7   5.9   0.7
  2.|-- core-100ge0-8-1-5.slc01.fiberstate.com           0.0%   200    1.1   1.3   0.3  31.1   2.8
  3.|-- ???                                             100.0   200    0.0   0.0   0.0   0.0   0.0
  4.|-- be3917.rcr51.b056940-0.slc01.atlas.cogentco.com  0.0%   200    1.5   1.3   1.0   2.0   0.2
  5.|-- be2257.ccr32.slc01.atlas.cogentco.com            0.0%   200    2.3   2.2   1.8   2.9   0.2
  6.|-- be3038.ccr22.den01.atlas.cogentco.com            0.0%   200   12.7  13.5  12.1  80.4   7.6
  7.|-- be3036.ccr22.mci01.atlas.cogentco.com            0.0%   200   23.8  25.0  23.4 102.0   8.1
  8.|-- be2832.ccr42.ord01.atlas.cogentco.com            0.0%   200   35.4  36.4  35.1 102.8   6.8
  9.|-- be2718.ccr22.cle04.atlas.cogentco.com            0.0%   200   42.0  43.1  41.7 109.9   6.6
 10.|-- be2879.ccr22.alb02.atlas.cogentco.com            0.0%   200   52.9  52.6  52.2  54.1   0.2
 11.|-- be3600.ccr32.bos01.atlas.cogentco.com            0.0%   200   56.1  56.5  55.7  78.9   2.5
 12.|-- be2101.ccr42.lon13.atlas.cogentco.com            0.0%   200  118.0 120.5 117.7 181.0   9.4
 13.|-- be12489.ccr42.par01.atlas.cogentco.com           0.0%   200  128.6 130.6 128.0 185.3   9.7
 14.|-- be3184.ccr31.par04.atlas.cogentco.com            0.0%   200  129.7 130.3 128.4 177.9   7.1
 15.|-- be3750.rcr21.b022890-0.par04.atlas.cogentco.com  0.0%   200  129.4 129.1 128.8 129.7   0.2
 16.|-- online.demarc.cogentco.com                       0.0%   200  126.0 125.6 125.2 126.2   0.2
 17.|-- 51.158.8.183                                     0.0%   200  129.0 129.2 128.9 130.2   0.2
 18.|-- 51.158.8.5                                       0.0%   200  125.3 125.4 125.1 125.8   0.1
 19.|-- final.destination.5.the.movie                    0.0%   200  125.8 125.2 124.9 127.3   0.3

Hybula · March 8

As long as there is no packet loss at the last hop, your destination, everything is OK. Network operators usually have a low priority for ICMP on routers causing this.

OhJohn · March 8

? None of the mtr you present here show any packet loss? You should probably do long-running mtrs (>3000) to show any packet loss. But your reports here (with 200) show 0.0% packet loss on the destination.

proofofsteak · March 8

Ah ok, so it only matters if there is packet loss on the last line of an MTR?

dbContext · March 8

@proofofsteak said:
Ah ok, so it only matters if there is packet loss on the last line of an MTR?

Correct.

emgh · March 8

@Hybula said:
As long as there is no packet loss at the last hop, your destination, everything is OK. Network operators usually have a low priority for ICMP on routers causing this.

This. You'll see the same thing in most routes.

lowenduser1 · March 8

@emgh said:

@Hybula said:
As long as there is no packet loss at the last hop, your destination, everything is OK. Network operators usually have a low priority for ICMP on routers causing this.

This. You'll see the same thing in most routes.

It's somewhat newish and a horrible confusing practice. Fuckery by throttle or outright blocking ICMP is explicit noted in the RFC as bad practice. The legitimate cases here are congestion, floods or whatever kind of bad weather, so one could say this is a sign that there is degradation once you see more than zero loss in the middle of a route.

AllHost_Rep · March 8

@lowenduser1 said:

@emgh said:

@Hybula said:
As long as there is no packet loss at the last hop, your destination, everything is OK. Network operators usually have a low priority for ICMP on routers causing this.

This. You'll see the same thing in most routes.

It's somewhat newish and a horrible confusing practice. Fuckery by throttle or outright blocking ICMP is explicit noted in the RFC as bad practice. The legitimate cases here are congestion, floods or whatever kind of bad weather, so one could say this is a sign that there is degradation once you see more than zero loss in the middle of a route.

No. Protecting the control plane is very far from a new practice. Confusing? Sure, but that stems from end-users trying to perform diagnostics on topics they're unfamiliar with.

You could punt millions of ICMPs through a forwarding-plane without issue, you can't say the same for ICMP to the control-plane on the same router.

lowenduser1 · March 8

@AllHost_Rep said:

@lowenduser1 said:

@emgh said:

@Hybula said:
As long as there is no packet loss at the last hop, your destination, everything is OK. Network operators usually have a low priority for ICMP on routers causing this.

This. You'll see the same thing in most routes.

It's somewhat newish and a horrible confusing practice. Fuckery by throttle or outright blocking ICMP is explicit noted in the RFC as bad practice. The legitimate cases here are congestion, floods or whatever kind of bad weather, so one could say this is a sign that there is degradation once you see more than zero loss in the middle of a route.

No. Protecting the control plane is very far from a new practice. Confusing? Sure, but that stems from end-users trying to perform diagnostics on topics they're unfamiliar with.

You could punt millions of ICMPs through a forwarding-plane without issue, you can't say the same for ICMP to the control-plane on the same router.

That's the equivalent of seeing smoke above the train station yet they deliver you at destination in time. Even when everyone says its fine, the smoke is there. QoS of diagnostic protocols is last resort as per spec.

PureVoltage · March 8

As others said no packet loss at the end point no issues happening.

As others said ICMP is typically limited and dropped from a lot of networking equipment.

AllHost_Rep · March 8

@lowenduser1 said:
That's the equivalent of seeing smoke above the train station yet they deliver you at destination in time. Even when everyone says its fine, the smoke is there. QoS of diagnostic protocols is last resort as per spec.

Far from it. If you want to draw analogies then it'd be the equivalent of the train driver not responding to every passenger unnecessarily knocking at his door whilst the train is in motion, because the driver is focused on getting the train to the destination on time

If you truly believe there should be no restrictions on ICMP requests to the control-plane then I've nothing further to add to this. I just hope individuals reading your posts decide to fact check for themselves.

Source & experience: Currently work for a Tier 1 in network security, and yourself?

lowenduser1 · March 8

@AllHost_Rep said:

@lowenduser1 said:
That's the equivalent of seeing smoke above the train station yet they deliver you at destination in time. Even when everyone says its fine, the smoke is there. QoS of diagnostic protocols is last resort as per spec.

Far from it. If you want to draw analogies then it'd be the equivalent of the train driver not responding to every passenger unnecessarily knocking at his door whilst the train is in motion, because the driver is focused on getting the train to the destination on time

If you truly believe there should be no restrictions on ICMP requests to the control-plane then I've nothing further to add to this. I just hope individuals reading your posts decide to fact check for themselves.

Source & experience: Currently work for a Tier 1 in network security, and yourself?

Nah it's an ancient ass protocol that doesn't scale properly to meet spec in the current use cases. It suffers from limitations made in a different age. Surely it's efficient in the grand scale of things to serve such much users but unfortunately there will be knocking on doors and rightfully so and will become more apparent on busy routes over time

trewq · March 8

@lowenduser1 said:

@AllHost_Rep said:

@lowenduser1 said:
That's the equivalent of seeing smoke above the train station yet they deliver you at destination in time. Even when everyone says its fine, the smoke is there. QoS of diagnostic protocols is last resort as per spec.

Far from it. If you want to draw analogies then it'd be the equivalent of the train driver not responding to every passenger unnecessarily knocking at his door whilst the train is in motion, because the driver is focused on getting the train to the destination on time

If you truly believe there should be no restrictions on ICMP requests to the control-plane then I've nothing further to add to this. I just hope individuals reading your posts decide to fact check for themselves.

Source & experience: Currently work for a Tier 1 in network security, and yourself?

Nah it's an ancient ass protocol that doesn't scale properly to meet spec in the current use cases. It suffers from limitations made in a different age. Surely it's efficient in the grand scale of things to serve such much users but unfortunately there will be knocking on doors and rightfully so and will become more apparent on busy routes over time

As long as packets get where they need to go then it’s not a big issue though. It only causes confusion for people who don’t have experience reading MTRs.

The below link is a very good resource for those with minimal networking experience but want to learn more about correctly interpreting traceroutes/MTRs

https://archive.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute_N47_Sun.pdf

rm_ · March 8

@lowenduser1 said: That's the equivalent of seeing smoke above the train station yet they deliver you at destination in time. Even when everyone says its fine, the smoke is there. QoS of diagnostic protocols is last resort as per spec.

If enough people start raising a fuss about this, then more providers will start using MPLS on their infra, so you only see somehing like:

...
  4.|-- router2.london.provider.net                0.0%   200    1.5   1.3   1.0   2.0   0.2
  5.|-- router100500.tokyo.provider.net            0.0%   200  353.0 330.0 350.3 380.7   0.2
...

There are examples of that in the wild, just don't have any src/dst offhand to show you a real one.

david · March 8

I'm still trying to figure out how the ipv6 route between Tokyo and the Philippines takes 150-200ms (ipv4 is 46-59ms).

  Host                                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. (waiting for reply)
  2. vl199-c8-7-b2-1.pnj1.constant.com                     0.0%    10    0.7   0.7   0.5   1.4   0.2
  3. ethernetet-2-0-24-sr2.tyo2.constant.com               0.0%    10    1.1   1.4   1.0   4.2   1.0
  4. ethernetae6-er1.tyo2.constant.com                    44.4%    10   70.4  22.3   1.0  70.4  29.5
  5. (waiting for reply)
  6. (waiting for reply)
  7. (waiting for reply)
  8. ipg-as-ap-as9299.port-channel5.switch1.lax2.he.net   30.0%    10  146.1 146.2 146.1 146.5   0.2
  9. (waiting for reply)
 10. 2001:4450:10:6000::4                                 44.4%    10  140.6 140.6 140.5 140.7   0.1
 11. 2001:4450:10:6000::61                                50.0%    10  144.8 144.5 144.2 144.8   0.2
 12. 2001:4450:10:284::1                                   0.0%    10  156.5 159.3 156.5 168.4   4.2

I think the lax2.he.net rdns must not really be in Los Angeles. But nonetheless it must be getting routed all around to be that bad. Even if it was getting routed through Singapore it shouldn't be more than 100ms.

aeg · March 8

Nobody's blocking ICMP requests here. Routers throttle the generation of ICMP time exceeded messages... No drops, they're just not being sent in the first place. The outgoing packets from traceroute aren't ICMP, they're UDP.

Howdy, Stranger!

Categories

In this Discussion

Hurricane Electric - is this packet loss normal?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Hurricane Electric - is this packet loss normal?

Comments