Hosthatch issues after maintenance on LA

cause · February 2023

After scheduled maintenance on 2023-02-01 affected Los Angeles nodes, noticed some issues on their nodes and network.

IPv6 is unreachable. Their router fe80::1 is not responding to neighbor discovery.
MTU on the private network was changed to 1500. It does not pass packets with 9000 bytes now.
If you have a private network in LA, check the link MTU. it may cause intermittent connectivity issues later.

psb777 · February 2023

Mine was rebooted yesterday. IPv6 has been unreachable since. I also found that MTU on the public network is lower than 1500, namely 1476, and that causes connectivity issues for SSH and others.
Edit: looks like both issues have been fixed.

Daniel15 · February 2023

It's really a mixed bag at the moment. I've got four VPSes with HostHatch in LA:

la01 (185.198.26.x, Intel) is working fine over IPv4 but not IPv6
la03 (185.197.30.x, Intel) is completely inaccessible even though it's booted up and I can connect to it via VNC in their control panel
la04 (185.197.30.x, storage VPS, Intel) is working over both IPv4 and IPv6
la05 (45.67.219.x, AMD EPYC) is working over both IPv4 and IPv6

I'm not sure why they have IPv6 issues so often. I've got 31 different VPSes for dnstools.ws and the only ones I have IPv6 issues with are HostHatch ones.

Also:

None of the VPSes can ping each other's public IP, neither over IPv4 nor IPv6. I just get "Destination Host Unreachable". In fact, I can't reach any IP in the 185.197.30.0/24 and 45.67.219.0/24 ranges from any of the VPSes.
The internal network isn't working at all.

psb777 · February 2023

Yeah it's really a mixed bag, but also evolving, from what I can tell, some VPS still has the MTU<1500 issue, some has unreachable IPv6, but others are ok (at least with ::1).

@Daniel15 On those with working IPv6, are you using subnet::1 address? Do other addresses from your subnet work?

Maybe due to the fact that VPS can't ping each other's public IP, the IPv6 gateway needs to be fe80::1 but not 2a04:bdc7:100::1.

fluffernutter · February 2023

@Daniel15 said:
It's really a mixed bag at the moment. I've got four VPSes with HostHatch in LA:

la01 (185.198.26.x, Intel) is working fine over IPv4 but not IPv6

la03 (185.197.30.x, Intel) is completely inaccessible even though it's booted up and I can connect to it via VNC in their control panel

la04 (185.197.30.x, storage VPS, Intel) is working over both IPv4 and IPv6

la05 (45.67.219.x, AMD EPYC) is working over both IPv4 and IPv6

I'm not sure why they have IPv6 issues so often. I've got 31 different VPSes for dnstools.ws and the only ones I have IPv6 issues with are HostHatch ones.

Also:

None of the VPSes can ping each other's public IP, neither over IPv4 nor IPv6. I just get "Destination Host Unreachable". In fact, I can't reach any IP in the 185.197.30.0/24 and 45.67.219.0/24 ranges from any of the VPSes.

The internal network isn't working at all.

They entirely changed upstreams, network stability should be miles better now. I think it's worth giving them time to fix these issues.

Daniel15 · February 2023

@psb777 said: On those with working IPv6

I think I spoke too soon because IPv6 is broken on all of them now. I'm using the ::1 address on all of them I think, and fe80::1 as the gateway. Network stability may very well be better with the new upstream, but it's not stable if it's broken

The private network is still broken too.

cause · February 2023

It seems the ipv6 router fe80::1 is still broken.

The router does not send neighbor solicit to subnet::1 when I ping the VM from the internet. It still randomly responds to NS from VM and learns the mac address if the NS packet was sent from subnet::1. That might be why the ipv6 connectivity was dropped and restored randomly, depending on when the kernel on VM refreshes the neighbor/arp tables.

perhaps, filter rules on their router or host nodes are dropping icmp6 packets from the router.

To check if the router is responding to solicitation, try
sudo apt install ndisc6
sudo ndisc6 -s $yournet::1 fe80::1 ens3

Daniel15 · February 2023

IPv6 seems like it's kinda somewhat working now. It's flaky on two of my VPSes, but is working fine on the other two.
The private network is working.
The VPSes can ping each others' public IPs again.

Connections between one of my servers in Las Vegas and my HostHatch storage VPS in Los Angeles seem... weird though. iperf just dies after a while? Maybe it's the MTU issue someone mentioned.

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  14.5 MBytes   122 Mbits/sec  1901    448 KBytes
[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    2   1.41 KBytes
[  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
[  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
[  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
[  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
[  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
[  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  14.5 MBytes  12.2 Mbits/sec  1906             sender
[  5]   0.00-10.04  sec  9.78 MBytes  8.17 Mbits/sec                  receiver

It's fine with another VPS I have in San Jose though (both over IPv4 and IPv6)...

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  82.8 MBytes   694 Mbits/sec    0   2.39 MBytes
[  5]   1.00-2.00   sec  77.5 MBytes   650 Mbits/sec    0   2.35 MBytes
[  5]   2.00-3.00   sec  77.5 MBytes   650 Mbits/sec    0   2.33 MBytes
[  5]   3.00-4.00   sec  77.5 MBytes   650 Mbits/sec    0   2.41 MBytes
[  5]   4.00-5.00   sec  77.5 MBytes   650 Mbits/sec    0   2.34 MBytes
[  5]   5.00-6.00   sec  77.5 MBytes   650 Mbits/sec    0   2.33 MBytes
[  5]   6.00-7.00   sec  78.8 MBytes   661 Mbits/sec    0   2.33 MBytes
[  5]   7.00-8.00   sec  77.5 MBytes   650 Mbits/sec    0   2.33 MBytes
[  5]   8.00-9.00   sec  77.5 MBytes   650 Mbits/sec    0   2.39 MBytes
[  5]   9.00-10.00  sec  78.8 MBytes   661 Mbits/sec    0   2.32 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   783 MBytes   657 Mbits/sec    0             sender
[  5]   0.00-10.04  sec   772 MBytes   645 Mbits/sec                  receiver

It'll eventually all work properly again. I've learnt to take their maintenance windows and add a few extra days.

stoned · February 2023

#  ping6 google.com
PING google.com(lax17s49-in-x0e.1e100.net (2607:f8b0:4007:814::200e)) 56 data bytes
64 bytes from lax17s49-in-x0e.1e100.net (2607:f8b0:4007:814::200e): icmp_seq=1 ttl=116 time=0.927 ms
64 bytes from lax17s49-in-x0e.1e100.net (2607:f8b0:4007:814::200e): icmp_seq=2 ttl=116 time=0.643 ms

--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.643/0.785/0.927/0.142 ms
^Croot@storage - ~ - -
#  ping4 google.com
PING  (142.250.68.110) 56(84) bytes of data.
64 bytes from lax31s12-in-f14.1e100.net (142.250.68.110): icmp_seq=1 ttl=115 time=1.26 ms
64 bytes from lax31s12-in-f14.1e100.net (142.250.68.110): icmp_seq=2 ttl=115 time=0.867 ms
64 bytes from lax31s12-in-f14.1e100.net (142.250.68.110): icmp_seq=3 ttl=115 time=0.584 ms
^C
---  ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.584/0.903/1.258/0.276 ms
root@storage - ~ - -

Everything seems good on my LA VPS.

TimboJones · February 2023

The MTU 1500 seems to be a frequent problem. They need to kill it with fire and do 9000 everywhere by default. Learn from previous issues!

paul1278 · February 2023

deleted

Daniel15 · February 2023

@TimboJones said:
The MTU 1500 seems to be a frequent problem. They need to kill it with fire and do 9000 everywhere by default. Learn from previous issues!

You can't use 9000 on anything public facing. If you want to use jumbo frames (9000 MTU), every router the data is going through needs to be configured to use them. That's doable on a local network, but not possible on the internet. On the internet, you'll just end up with fragmented packets, which will make performance even worse.

TimboJones · February 2023

@Daniel15 said:

@TimboJones said:
The MTU 1500 seems to be a frequent problem. They need to kill it with fire and do 9000 everywhere by default. Learn from previous issues!

You can't use 9000 on anything public facing. If you want to use jumbo frames (9000 MTU), every router the data is going through needs to be configured to use them. That's doable on a local network, but not possible on the internet. On the internet, you'll just end up with fragmented packets, which will make performance even worse.

Switches aren't routers and it only matters to the switch if you needed to communicate with the switch itself with packets over 1500. I know of no such use case where public facing Virtual switches would care (if your switch has a Public assigned IP reachable from Internet, you fucked up). An unmanaged switch operates with higher MTU without doing a thing because no management interface.

cobrah · February 2023

I have bought a VPS 2 months ago, the VPS is always in pending state. The ticket was opened for 2 months and no response from support team.

Daniel15 · February 2023

@TimboJones said: Switches aren't routers

Sure, but both switches and routers along the router from the source to the destination all need to support jumbo frames without fragmenting them. This is practically impossible as all routers and switches for internet backbones only support 1500 byte frames.

@TimboJones said: An unmanaged switch operates with higher MTU without doing a thing

No, the switch needs to support frames with 9000 MTU. Most unmanaged switches do support it out-of-the-box, but on managed switches (and routers) it's configurable. No core internet routers have jumbo frames enabled so your 9000 byte frames will just be fragmented into 1500 byte frames, adding extra overhead.

If you disagree then feel free to try and show any successful ping across the internet (not just in the same data center) using 9000 MTU. Make sure you use the 'do not fragment' flag (-M do).

TimboJones · February 2023

@Daniel15 said:

@TimboJones said: Switches aren't routers

Sure, but both switches and routers along the router from the source to the destination all need to support jumbo frames without fragmenting them. This is practically impossible as all routers and switches for internet backbones only support 1500 byte frames.

Right, the router is the limiter and so having jumbo frames enabled on private and public switches won't make or break the connection. Setting to 9000 on 1500 network does no harm. Setting 1500 on 9000 network does, as repeatedly experienced at HH.

@TimboJones said: An unmanaged switch operates with higher MTU without doing a thing

No, the switch needs to support frames with 9000 MTU. Most unmanaged switches do support it out-of-the-box, but on managed switches (and routers) it's configurable. No core internet routers have jumbo frames enabled so your 9000 byte frames will just be fragmented into 1500 byte frames, adding extra overhead.

I haven't had an unmanaged switch that didn't support jumbo frames since hubs were a thing.

If you disagree then feel free to try and show any successful ping across the internet (not just in the same data center) using 9000 MTU. Make sure you use the 'do not fragment' flag (-M do).

As explained above, you set 9000 on all switches by default with no harm done and set routers to the MTU to be used. You're the one who brought up the router.

Get it now? Default the switches to 9000 only helps and doesn't harm. It's still the routers that are the issue.

PM if you still don't understand. It would help to make my point so HH could be convinced as well. (Have been inconvenienced multiple times for weeks so I consider this a fixable issue that would make a difference).

Daniel15 · February 2023

Private networking is still broken for me. Is it broken for anyone else?

@TimboJones said: As explained above, you set 9000 on all switches by default with no harm done and set routers to the MTU to be used. You're the one who brought up the router.

Get it now? Default the switches to 9000 only helps and doesn't harm. It's still the routers that are the issue.

OK, I understand what you're saying now. I thought you meant using MTU 9000 on everything for a public-facing system (the server, the router and any switches).

psb777 · October 2024

@cause said: IPv6 is unreachable. Their router fe80::1 is not responding to neighbor discovery.

There's a similar situation after today's scheduled maintenance on LA (2024-10-06 14:00 to 16:00 PDT). I can receive IPv6 packets, but the gateway fe80::1 is not responding to neighbor solicitation.

On the bright side, the node's CPU has been upgraded after the maintenance. I believe the performance has surely improved, which is nice, although doesn't much offset the headache caused by IPv6...

psb777 · October 2024

Update: my ticket was answered within 2 hours with the issue fixed. Kudos to Hosthatch!

The gateway fe80::1 still does not respond to ping, but does to NS, and IPv6 works fine.

Dk2014 · August 2025

Lax's IPv6 is down again. A month after submitting the ticket, the issue still hasn't been resolved. I tried to fix it myself and then came across this post.

Same issue, fe80::1 isn’t responding to neighbor discovery.

root@Lax:~# ip -6 neigh show  
fe80::1 dev eth0 FAILED

Howdy, Stranger!

Categories

In this Discussion

Hosthatch issues after maintenance on LA

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Hosthatch issues after maintenance on LA

Comments