New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Hostslick - Network Maintenance 10th Oct 2024 - Upgrade to MX960-prem3
This discussion has been closed.

Comments
On the other hand, we take it from the positive side:
Mine will be down for 7 days tomorrow. Given 99.95% Uptime SLA, this means if it gets fixed tomorrow, it must run for the next 13 993 days (>38 years) without a single outage. Cool!
nvm
We are working very hard on this failed node and will Update shortly. Thanks you
I think your server will stay forever down and the other 1999 servers in the DC will run flawlessly until the end of the world. A communal kind of SLA.
They were 7+ TB of default Ubuntu installs. They will be sorely missed.
I have to say, a terrible upgrade. Even the Dedicated Server has experienced four disconnections in the past week.
all the best to hostslick especially now this is his main hustle.
IMO, affected customers deserve more than the promised 10-euro account credit and month of free service.
This night we made The last change to the in announcement said LACP Channel. For this we needed to reboot one of The linecards on The Juniper side.
After Mode Change Arista and Juniper all errors gone. Should be no more disruptions for your dedicated Server(s).
My free 20 gigabyte shared hosting is unaffected.
Different node?
One guy?
Mine went down an hour ago
https://hostslick.com/clients/announcements/71/Update-18.-October-2024-Regarding-delays-in-Support-and-issues.html
Dear clients,
we have small update regarding our last announcement.
Regarding
1) ARP Resolving Issues in a Larger (VPS) VLAN (after upgrade to new Routing Stack and changing network style)
Thursday night to Friday 18th October we have changed as announced the mode to native 40G for every Port-Channel member on this channel (2x40G).
For this we needed to restart some of the line cards to apply a new config on it.
First there was no errors until the morning. Then we have seen errors on the link again.
This is caused by some microcode on the optical transceivers/lasers. Since this was traced back by our supplier before to the microcode on the transceivers
we try soon this evening or in the morning of the Saturday to replace the transceivers with some from a other supplier. In stock we have only had 18 other 40G Modules of the same that are from the same supplier.
However there is no FPC restart or anything needed that would cause temporare packet loss due to the redundant style of the channel the traffic can contiune to flow while
this modules are being replaced.
We have deeply debugged this before, gathered informations, reviewed our setup and configurations as well as have anaylzed traffic in the network using netflow/sflow collectors and export the traffic data to our analyzers. Alot of actions has been done in the meanhile. There is no other disturbing factor in our network and all is fine.
Especially this is not causing any issues or performance loss on most paths in the network that might be connected via this channel.
Other channels using other transceivers as well as the 100G ones working without problems too.
2) Failure and Hardware Investigation/Replacement of node
10GKVMVPS28.hostslick.com (formerly 10GKVMVPS15.hostslick.com):
We have received our shipment with new hardware and SSDs to rebuild this node fully.
As mentioned we didnt have enough 2TB drives in stock (this node has 24x2TB SSD)
The Node has been reinstateed with name 10GKVMVPS28.hostslick.com.
Currently we are doing some tests and except to have all VPS that where served by it we think latest in next 12-24h.
The process of re-provisioning and processing all tickets will shortly begin within the next hour
As we are currently not restocking VPS very much (for us its a secondary product, we focus on the dedicated server market), we have had unfortunately not enough space free to move everyone to other node.
This is also because this Node was one of the bigger ones. So this Node catched us during a very bad time.
However our compensation offer still stands and everyone affected will receive 1 month free service on top and 10€ compensation credit.
This will be automatically given to the affected customers?
how?
Yes
Always nice when compensation is given automatically and not only to those who create ticket asking for it.
I just want to get all this fixxed whatever it takes. And i will. Im very upset at this events.
All the purchased equipment not cheap. This investment was made to offer 10Gbit as a standard with planned options to book up to 40Gbit of traffic for the mass/bulk.
The MX960 is not small or cheap device. It sucks 2KW power all alone. And that is not including the other Switches in our routing/network rack. But it can offer enough capacity to accomplish this.
regarding transceivers we have had hundreds without problems, altough 10G ones because we setup new dedicateds with 10G fiber ports.
We dont setup any new 1Gbit port servers since beginning this year anymore.
But they always been reliable. No failures, no problems, nothing.
and with the Node, i need to check here what i do. Those SSDs been very reliable until now. We have deployed the same into dedicated servers for customers and there is no single one failed. I have slapped the supplier arround, i have calculated them and we spend with them 20k€+ in last 12 months.
I will get my replacements for sure but i will not use them into more VPS Nodes for now and i havent used them much in any other nodes.
I am sorry to everyone if i am not able to reply here much. I am working on everything.
My goal is to get all done and have some weekend still.
Have a good evening.
the NEW UPGRADE was a success, every day now an IPv4/IPv6 outage! gotta love @HostSlick
@HostSlick newer Junos versions have an arp limiter in place which may be affecting you. I had a similar issue in Vegas some years ago. There’s an option to loosen it.
Francisco
uhm what? Never heard of that. I run Junos 22.4 though. Which is indeed one of the newer ones. And what you say is not seen in the configuration?
Im breaking my head here.
Modules been swapped out. No change on those errors. Dedicated Servers are not affected. Just this is happening on VPS Nodes - which have many IPs and then there are Hosting companies in their own VLAN which dont even face any issues too.
I only changed the aging-timer a bit now. Contiuning to investigate it.
Any tip on how to change/see if there is such limit would be much appreciated!
-
Im just fixing some other Juniper EX here right now of a rack colocation customer which "gone dead " nand failed and then i get back on it and check
@HostSlick https://www.juniper.net/documentation/us/en/software/junos/routing-policy/topics/example/example-configuring-arp-policer.html
Francisco
https://cloud.hostslick.com:4083/ is not loading.
confirms, they are actively working on things
People should have some patience. The provider is working on improving the services for everybody. Unexpected things are always bound to happen when there are many customers.
Checking and deploying! Thank you very much.
No Matter what i'll let you know
Be sure to read the part about there being a 'default' arp profiler that all interfaces are joined to by default. You'll want to create a new policy and put your interfaces into that.
Francisco
Well you where all right. And i was blind.
Before
After ive configured and applied policers

Why it is only a 150kbps limit at such big device, i can not understand why Juniper set that low. Or why they even would apply a default one and not just let the customer do it. The MX is a universal routing platform, so limit needed might be different depending on the use case of the device.
Ive found this article after your mentioning it https://lkhill.com/juniper-arp-policer/
THANKS Master @Francisco !
Now i can fully conentrate on all the Tickets, especially those tickets who are still left from the one Node.
>
When you first posted the thread and mentioned removing the Cisco, I was going to comment "That won't be the last time you see that unit", but didn't want to curse you.
Glad that fixed your issue. Good luck on the rest of your maintenance.
Francisco
♥️