Hostslick - Network Maintenance 10th Oct 2024 - Upgrade to MX960-prem3

fragilequant · October 2024

On the other hand, we take it from the positive side:

Mine will be down for 7 days tomorrow. Given 99.95% Uptime SLA, this means if it gets fixed tomorrow, it must run for the next 13 993 days (>38 years) without a single outage. Cool!

loay · October 2024

nvm

HostSlick · October 2024

@nouman8 said:

@verl20 said:
Any new updates from Hostslick?

There's an update on the site that says that the node will be back online by wednesday

We are working very hard on this failed node and will Update shortly. Thanks you

davide · October 2024

@fragilequant said:
On the other hand, we take it from the positive side:

Mine will be down for 7 days tomorrow. Given 99.95% Uptime SLA, this means if it gets fixed tomorrow, it must run for the next 13 993 days (>38 years) without a single outage. Cool!

I think your server will stay forever down and the other 1999 servers in the DC will run flawlessly until the end of the world. A communal kind of SLA.

Moopah · October 2024

@remy said:

@Moopah said:

@remy said:
Still impacted too.
After 2 VPS crashes with data loss impacting the same server. Now it's been offline for days. A bit frustrating but it happens...
However, I wouldn't mind a bit more information.

Same, I lost 10+ VPS from the RAID array failure (7+TB of data). But I am good idling customer, so I don't file support ticket to bother provider

7TB... I thought you were the idling master.
I'm disappointed to hear that. I hope we're talking about empty partitions!

They were 7+ TB of default Ubuntu installs. They will be sorely missed.

mmmxnrs · October 2024

I have to say, a terrible upgrade. Even the Dedicated Server has experienced four disconnections in the past week.

cybertech · October 2024

all the best to hostslick especially now this is his main hustle.

ServerBachelor · October 2024

@mmmxnrs said:
I have to say, a terrible upgrade. Even the Dedicated Server has experienced four disconnections in the past week.

IMO, affected customers deserve more than the promised 10-euro account credit and month of free service.

HostSlick · October 2024

@mmmxnrs said:
I have to say, a terrible upgrade. Even the Dedicated Server has experienced four disconnections in the past week.

This night we made The last change to the in announcement said LACP Channel. For this we needed to reboot one of The linecards on The Juniper side.

After Mode Change Arista and Juniper all errors gone. Should be no more disruptions for your dedicated Server(s).

10thHouse · October 2024

My free 20 gigabyte shared hosting is unaffected.
Different node?

10thHouse · October 2024

@cybertech said:
all the best to hostslick especially now this is his main hustle.

One guy?

sander815 · October 2024

@10thHouse said:
My free 20 gigabyte shared hosting is unaffected.
Different node?

@10thHouse said:
My free 20 gigabyte shared hosting is unaffected.
Different node?

Mine went down an hour ago

HostSlick · October 2024

https://hostslick.com/clients/announcements/71/Update-18.-October-2024-Regarding-delays-in-Support-and-issues.html

Dear clients,

we have small update regarding our last announcement.

Regarding
1) ARP Resolving Issues in a Larger (VPS) VLAN (after upgrade to new Routing Stack and changing network style)

Thursday night to Friday 18th October we have changed as announced the mode to native 40G for every Port-Channel member on this channel (2x40G).
For this we needed to restart some of the line cards to apply a new config on it.
First there was no errors until the morning. Then we have seen errors on the link again.
This is caused by some microcode on the optical transceivers/lasers. Since this was traced back by our supplier before to the microcode on the transceivers
we try soon this evening or in the morning of the Saturday to replace the transceivers with some from a other supplier. In stock we have only had 18 other 40G Modules of the same that are from the same supplier.

However there is no FPC restart or anything needed that would cause temporare packet loss due to the redundant style of the channel the traffic can contiune to flow while
this modules are being replaced.

We have deeply debugged this before, gathered informations, reviewed our setup and configurations as well as have anaylzed traffic in the network using netflow/sflow collectors and export the traffic data to our analyzers. Alot of actions has been done in the meanhile. There is no other disturbing factor in our network and all is fine.
Especially this is not causing any issues or performance loss on most paths in the network that might be connected via this channel.
Other channels using other transceivers as well as the 100G ones working without problems too.

2) Failure and Hardware Investigation/Replacement of node
10GKVMVPS28.hostslick.com (formerly 10GKVMVPS15.hostslick.com):

We have received our shipment with new hardware and SSDs to rebuild this node fully.
As mentioned we didnt have enough 2TB drives in stock (this node has 24x2TB SSD)
The Node has been reinstateed with name 10GKVMVPS28.hostslick.com.
Currently we are doing some tests and except to have all VPS that where served by it we think latest in next 12-24h.

The process of re-provisioning and processing all tickets will shortly begin within the next hour

As we are currently not restocking VPS very much (for us its a secondary product, we focus on the dedicated server market), we have had unfortunately not enough space free to move everyone to other node.
This is also because this Node was one of the bigger ones. So this Node catched us during a very bad time.
However our compensation offer still stands and everyone affected will receive 1 month free service on top and 10€ compensation credit.

lala_th · October 2024

@HostSlick said: However our compensation offer still stands and everyone affected will receive 1 month free service on top and 10€ compensation credit.

This will be automatically given to the affected customers?

tenpera · October 2024

@HostSlick said: will receive 1 month free service on top and 10€ compensation credit.

how?

HostSlick · October 2024

@lala_th said:

@HostSlick said: However our compensation offer still stands and everyone affected will receive 1 month free service on top and 10€ compensation credit.

This will be automatically given to the affected customers?

Yes

emgh · October 2024

@HostSlick said:

@lala_th said:

@HostSlick said: However our compensation offer still stands and everyone affected will receive 1 month free service on top and 10€ compensation credit.

This will be automatically given to the affected customers?

Yes

Always nice when compensation is given automatically and not only to those who create ticket asking for it.

HostSlick · October 2024

@emgh said:

@HostSlick said:

@lala_th said:

@HostSlick said: However our compensation offer still stands and everyone affected will receive 1 month free service on top and 10€ compensation credit.

This will be automatically given to the affected customers?

Yes

Always nice when compensation is given automatically and not only to those who create ticket asking for it.

I just want to get all this fixxed whatever it takes. And i will. Im very upset at this events.
All the purchased equipment not cheap. This investment was made to offer 10Gbit as a standard with planned options to book up to 40Gbit of traffic for the mass/bulk.
The MX960 is not small or cheap device. It sucks 2KW power all alone. And that is not including the other Switches in our routing/network rack. But it can offer enough capacity to accomplish this.

regarding transceivers we have had hundreds without problems, altough 10G ones because we setup new dedicateds with 10G fiber ports.
We dont setup any new 1Gbit port servers since beginning this year anymore.
But they always been reliable. No failures, no problems, nothing.

and with the Node, i need to check here what i do. Those SSDs been very reliable until now. We have deployed the same into dedicated servers for customers and there is no single one failed. I have slapped the supplier arround, i have calculated them and we spend with them 20k€+ in last 12 months.
I will get my replacements for sure but i will not use them into more VPS Nodes for now and i havent used them much in any other nodes.

I am sorry to everyone if i am not able to reply here much. I am working on everything.
My goal is to get all done and have some weekend still.

Have a good evening.

cold · October 2024

the NEW UPGRADE was a success, every day now an IPv4/IPv6 outage! gotta love @HostSlick

Francisco · October 2024

@HostSlick newer Junos versions have an arp limiter in place which may be affecting you. I had a similar issue in Vegas some years ago. There’s an option to loosen it.

Francisco

HostSlick · October 2024

@Francisco said:
@HostSlick newer Junos versions have an arp limiter in place which may be affecting you. I had a similar issue in Vegas some years ago. There’s an option to loosen it.

Francisco

uhm what? Never heard of that. I run Junos 22.4 though. Which is indeed one of the newer ones. And what you say is not seen in the configuration?

Im breaking my head here.

Modules been swapped out. No change on those errors. Dedicated Servers are not affected. Just this is happening on VPS Nodes - which have many IPs and then there are Hosting companies in their own VLAN which dont even face any issues too.
I only changed the aging-timer a bit now. Contiuning to investigate it.

Any tip on how to change/see if there is such limit would be much appreciated!

-

Im just fixing some other Juniper EX here right now of a rack colocation customer which "gone dead " nand failed and then i get back on it and check

Francisco · October 2024

@HostSlick https://www.juniper.net/documentation/us/en/software/junos/routing-policy/topics/example/example-configuring-arp-policer.html

Francisco

tenpera · October 2024

https://cloud.hostslick.com:4083/ is not loading.

ShadowLurker · October 2024

@tenpera said:
https://cloud.hostslick.com:4083/ is not loading.

confirms, they are actively working on things

default · October 2024

People should have some patience. The provider is working on improving the services for everybody. Unexpected things are always bound to happen when there are many customers.

HostSlick · October 2024

@Francisco said:
@HostSlick https://www.juniper.net/documentation/us/en/software/junos/routing-policy/topics/example/example-configuring-arp-policer.html

Francisco

Checking and deploying! Thank you very much.
No Matter what i'll let you know

Francisco · October 2024

@HostSlick said:

@Francisco said:
@HostSlick https://www.juniper.net/documentation/us/en/software/junos/routing-policy/topics/example/example-configuring-arp-policer.html

Francisco

Checking and deploying! Thank you very much.
No Matter what i'll let you know

Be sure to read the part about there being a 'default' arp profiler that all interfaces are joined to by default. You'll want to create a new policy and put your interfaces into that.

Francisco

HostSlick · October 2024

@Francisco said:

@HostSlick said:

@Francisco said:
@HostSlick https://www.juniper.net/documentation/us/en/software/junos/routing-policy/topics/example/example-configuring-arp-policer.html

Francisco

Checking and deploying! Thank you very much.
No Matter what i'll let you know

Be sure to read the part about there being a 'default' arp profiler that all interfaces are joined to by default. You'll want to create a new policy and put your interfaces into that.

Francisco

Well you where all right. And i was blind.

Before

After ive configured and applied policers

Why it is only a 150kbps limit at such big device, i can not understand why Juniper set that low. Or why they even would apply a default one and not just let the customer do it. The MX is a universal routing platform, so limit needed might be different depending on the use case of the device.

Ive found this article after your mentioning it https://lkhill.com/juniper-arp-policer/

THANKS Master @Francisco !

Now i can fully conentrate on all the Tickets, especially those tickets who are still left from the one Node.

Francisco · October 2024

@HostSlick said: THANKS Master @Francisco !

>

When you first posted the thread and mentioned removing the Cisco, I was going to comment "That won't be the last time you see that unit", but didn't want to curse you.

Glad that fixed your issue. Good luck on the rest of your maintenance.

Francisco

HOSTCAY · October 2024

@Francisco said:

@HostSlick said: THANKS Master @Francisco !

>

When you first posted the thread and mentioned removing the Cisco, I was going to comment "That won't be the last time you see that unit", but didn't want to curse you.

Glad that fixed your issue. Good luck on the rest of your maintenance.

Francisco

♥️

Howdy, Stranger!

Categories

In this Discussion

Hostslick - Network Maintenance 10th Oct 2024 - Upgrade to MX960-prem3

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Hostslick - Network Maintenance 10th Oct 2024 - Upgrade to MX960-prem3

Comments