Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Hosthatch issues after maintenance on LA
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Hosthatch issues after maintenance on LA

After scheduled maintenance on 2023-02-01 affected Los Angeles nodes, noticed some issues on their nodes and network.

IPv6 is unreachable. Their router fe80::1 is not responding to neighbor discovery.
MTU on the private network was changed to 1500. It does not pass packets with 9000 bytes now.
If you have a private network in LA, check the link MTU. it may cause intermittent connectivity issues later.

Comments

  • psb777psb777 Member
    edited February 2023

    Mine was rebooted yesterday. IPv6 has been unreachable since. I also found that MTU on the public network is lower than 1500, namely 1476, and that causes connectivity issues for SSH and others.
    Edit: looks like both issues have been fixed.

  • Daniel15Daniel15 Veteran
    edited February 2023

    It's really a mixed bag at the moment. I've got four VPSes with HostHatch in LA:

    • la01 (185.198.26.x, Intel) is working fine over IPv4 but not IPv6
    • la03 (185.197.30.x, Intel) is completely inaccessible even though it's booted up and I can connect to it via VNC in their control panel
    • la04 (185.197.30.x, storage VPS, Intel) is working over both IPv4 and IPv6
    • la05 (45.67.219.x, AMD EPYC) is working over both IPv4 and IPv6

    I'm not sure why they have IPv6 issues so often. I've got 31 different VPSes for dnstools.ws and the only ones I have IPv6 issues with are HostHatch ones.

    Also:

    • None of the VPSes can ping each other's public IP, neither over IPv4 nor IPv6. I just get "Destination Host Unreachable". In fact, I can't reach any IP in the 185.197.30.0/24 and 45.67.219.0/24 ranges from any of the VPSes.
    • The internal network isn't working at all.
    Thanked by 2corbpie shuier
  • Yeah it's really a mixed bag, but also evolving, from what I can tell, some VPS still has the MTU<1500 issue, some has unreachable IPv6, but others are ok (at least with ::1).

    @Daniel15 On those with working IPv6, are you using subnet::1 address? Do other addresses from your subnet work?

    Maybe due to the fact that VPS can't ping each other's public IP, the IPv6 gateway needs to be fe80::1 but not 2a04:bdc7:100::1.

  • @Daniel15 said:
    It's really a mixed bag at the moment. I've got four VPSes with HostHatch in LA:

    • la01 (185.198.26.x, Intel) is working fine over IPv4 but not IPv6
    • la03 (185.197.30.x, Intel) is completely inaccessible even though it's booted up and I can connect to it via VNC in their control panel
    • la04 (185.197.30.x, storage VPS, Intel) is working over both IPv4 and IPv6
    • la05 (45.67.219.x, AMD EPYC) is working over both IPv4 and IPv6

    I'm not sure why they have IPv6 issues so often. I've got 31 different VPSes for dnstools.ws and the only ones I have IPv6 issues with are HostHatch ones.

    Also:

    • None of the VPSes can ping each other's public IP, neither over IPv4 nor IPv6. I just get "Destination Host Unreachable". In fact, I can't reach any IP in the 185.197.30.0/24 and 45.67.219.0/24 ranges from any of the VPSes.
    • The internal network isn't working at all.

    They entirely changed upstreams, network stability should be miles better now. I think it's worth giving them time to fix these issues.

  • Daniel15Daniel15 Veteran
    edited February 2023

    @psb777 said: On those with working IPv6

    I think I spoke too soon because IPv6 is broken on all of them now. I'm using the ::1 address on all of them I think, and fe80::1 as the gateway. Network stability may very well be better with the new upstream, but it's not stable if it's broken :tongue:

    The private network is still broken too.

  • It seems the ipv6 router fe80::1 is still broken.

    The router does not send neighbor solicit to subnet::1 when I ping the VM from the internet. It still randomly responds to NS from VM and learns the mac address if the NS packet was sent from subnet::1. That might be why the ipv6 connectivity was dropped and restored randomly, depending on when the kernel on VM refreshes the neighbor/arp tables.

    perhaps, filter rules on their router or host nodes are dropping icmp6 packets from the router.

    To check if the router is responding to solicitation, try
    sudo apt install ndisc6
    sudo ndisc6 -s $yournet::1 fe80::1 ens3

  • Daniel15Daniel15 Veteran
    edited February 2023
    • IPv6 seems like it's kinda somewhat working now. It's flaky on two of my VPSes, but is working fine on the other two.
    • The private network is working.
    • The VPSes can ping each others' public IPs again.

    Connections between one of my servers in Las Vegas and my HostHatch storage VPS in Los Angeles seem... weird though. iperf just dies after a while? Maybe it's the MTU issue someone mentioned.

    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  5]   0.00-1.00   sec  14.5 MBytes   122 Mbits/sec  1901    448 KBytes
    [  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    2   1.41 KBytes
    [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
    [  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
    [  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
    [  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
    [  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
    [  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
    [  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
    [  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.00  sec  14.5 MBytes  12.2 Mbits/sec  1906             sender
    [  5]   0.00-10.04  sec  9.78 MBytes  8.17 Mbits/sec                  receiver
    

    It's fine with another VPS I have in San Jose though (both over IPv4 and IPv6)...

    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  5]   0.00-1.00   sec  82.8 MBytes   694 Mbits/sec    0   2.39 MBytes
    [  5]   1.00-2.00   sec  77.5 MBytes   650 Mbits/sec    0   2.35 MBytes
    [  5]   2.00-3.00   sec  77.5 MBytes   650 Mbits/sec    0   2.33 MBytes
    [  5]   3.00-4.00   sec  77.5 MBytes   650 Mbits/sec    0   2.41 MBytes
    [  5]   4.00-5.00   sec  77.5 MBytes   650 Mbits/sec    0   2.34 MBytes
    [  5]   5.00-6.00   sec  77.5 MBytes   650 Mbits/sec    0   2.33 MBytes
    [  5]   6.00-7.00   sec  78.8 MBytes   661 Mbits/sec    0   2.33 MBytes
    [  5]   7.00-8.00   sec  77.5 MBytes   650 Mbits/sec    0   2.33 MBytes
    [  5]   8.00-9.00   sec  77.5 MBytes   650 Mbits/sec    0   2.39 MBytes
    [  5]   9.00-10.00  sec  78.8 MBytes   661 Mbits/sec    0   2.32 MBytes
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.00  sec   783 MBytes   657 Mbits/sec    0             sender
    [  5]   0.00-10.04  sec   772 MBytes   645 Mbits/sec                  receiver
    

    It'll eventually all work properly again. I've learnt to take their maintenance windows and add a few extra days.

  • stonedstoned Member
    edited February 2023
    #  ping6 google.com
    PING google.com(lax17s49-in-x0e.1e100.net (2607:f8b0:4007:814::200e)) 56 data bytes
    64 bytes from lax17s49-in-x0e.1e100.net (2607:f8b0:4007:814::200e): icmp_seq=1 ttl=116 time=0.927 ms
    64 bytes from lax17s49-in-x0e.1e100.net (2607:f8b0:4007:814::200e): icmp_seq=2 ttl=116 time=0.643 ms
    
    --- google.com ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1001ms
    rtt min/avg/max/mdev = 0.643/0.785/0.927/0.142 ms
    ^Croot@storage - ~ - -
    #  ping4 google.com
    PING  (142.250.68.110) 56(84) bytes of data.
    64 bytes from lax31s12-in-f14.1e100.net (142.250.68.110): icmp_seq=1 ttl=115 time=1.26 ms
    64 bytes from lax31s12-in-f14.1e100.net (142.250.68.110): icmp_seq=2 ttl=115 time=0.867 ms
    64 bytes from lax31s12-in-f14.1e100.net (142.250.68.110): icmp_seq=3 ttl=115 time=0.584 ms
    ^C
    ---  ping statistics ---
    3 packets transmitted, 3 received, 0% packet loss, time 2003ms
    rtt min/avg/max/mdev = 0.584/0.903/1.258/0.276 ms
    root@storage - ~ - -
    

    Everything seems good on my LA VPS.

  • The MTU 1500 seems to be a frequent problem. They need to kill it with fire and do 9000 everywhere by default. Learn from previous issues!

    Thanked by 1bdl
  • paul1278paul1278 Member
    edited February 2023

    deleted

  • @TimboJones said:
    The MTU 1500 seems to be a frequent problem. They need to kill it with fire and do 9000 everywhere by default. Learn from previous issues!

    You can't use 9000 on anything public facing. If you want to use jumbo frames (9000 MTU), every router the data is going through needs to be configured to use them. That's doable on a local network, but not possible on the internet. On the internet, you'll just end up with fragmented packets, which will make performance even worse.

  • TimboJonesTimboJones Member
    edited February 2023

    @Daniel15 said:

    @TimboJones said:
    The MTU 1500 seems to be a frequent problem. They need to kill it with fire and do 9000 everywhere by default. Learn from previous issues!

    You can't use 9000 on anything public facing. If you want to use jumbo frames (9000 MTU), every router the data is going through needs to be configured to use them. That's doable on a local network, but not possible on the internet. On the internet, you'll just end up with fragmented packets, which will make performance even worse.

    Switches aren't routers and it only matters to the switch if you needed to communicate with the switch itself with packets over 1500. I know of no such use case where public facing Virtual switches would care (if your switch has a Public assigned IP reachable from Internet, you fucked up). An unmanaged switch operates with higher MTU without doing a thing because no management interface.

  • I have bought a VPS 2 months ago, the VPS is always in pending state. The ticket was opened for 2 months and no response from support team.

  • Daniel15Daniel15 Veteran
    edited February 2023

    @TimboJones said: Switches aren't routers

    Sure, but both switches and routers along the router from the source to the destination all need to support jumbo frames without fragmenting them. This is practically impossible as all routers and switches for internet backbones only support 1500 byte frames.

    @TimboJones said: An unmanaged switch operates with higher MTU without doing a thing

    No, the switch needs to support frames with 9000 MTU. Most unmanaged switches do support it out-of-the-box, but on managed switches (and routers) it's configurable. No core internet routers have jumbo frames enabled so your 9000 byte frames will just be fragmented into 1500 byte frames, adding extra overhead.

    If you disagree then feel free to try and show any successful ping across the internet (not just in the same data center) using 9000 MTU. Make sure you use the 'do not fragment' flag (-M do).

  • TimboJonesTimboJones Member
    edited February 2023

    @Daniel15 said:

    @TimboJones said: Switches aren't routers

    Sure, but both switches and routers along the router from the source to the destination all need to support jumbo frames without fragmenting them. This is practically impossible as all routers and switches for internet backbones only support 1500 byte frames.

    Right, the router is the limiter and so having jumbo frames enabled on private and public switches won't make or break the connection. Setting to 9000 on 1500 network does no harm. Setting 1500 on 9000 network does, as repeatedly experienced at HH.

    @TimboJones said: An unmanaged switch operates with higher MTU without doing a thing

    No, the switch needs to support frames with 9000 MTU. Most unmanaged switches do support it out-of-the-box, but on managed switches (and routers) it's configurable. No core internet routers have jumbo frames enabled so your 9000 byte frames will just be fragmented into 1500 byte frames, adding extra overhead.

    I haven't had an unmanaged switch that didn't support jumbo frames since hubs were a thing.

    If you disagree then feel free to try and show any successful ping across the internet (not just in the same data center) using 9000 MTU. Make sure you use the 'do not fragment' flag (-M do).

    As explained above, you set 9000 on all switches by default with no harm done and set routers to the MTU to be used. You're the one who brought up the router.

    Get it now? Default the switches to 9000 only helps and doesn't harm. It's still the routers that are the issue.

    PM if you still don't understand. It would help to make my point so HH could be convinced as well. (Have been inconvenienced multiple times for weeks so I consider this a fixable issue that would make a difference).

  • Private networking is still broken for me. Is it broken for anyone else?

    @TimboJones said: As explained above, you set 9000 on all switches by default with no harm done and set routers to the MTU to be used. You're the one who brought up the router.

    Get it now? Default the switches to 9000 only helps and doesn't harm. It's still the routers that are the issue.

    OK, I understand what you're saying now. I thought you meant using MTU 9000 on everything for a public-facing system (the server, the router and any switches).

    Thanked by 1TimboJones
Sign In or Register to comment.