crunchbits outage?

eezcloud · November 2024

On their discord:
The network engineering team has identified the issue as a line card failure, which is affecting the main upstream connection. They are actively working on resolving it. We’ll share an estimated time for repair (ETR) as soon as possible. Thank you for your patience.

haste0675 · November 2024

so much for HA

sasslik · November 2024

@haste0675 said:
so much for HA

lol you really thought Crunchbits offering HA service? born yesterday?

danblaze · November 2024

@sasslik said:

@haste0675 said:
so much for HA

lol you really thought Crunchbits offering HA service? born yesterday?

Not to mention I don't think this is a situation where HA is effective, a more effective approach might be to implement a K3s in Seattle and their WA to implement failover, on a business level.

wolfypro · November 2024

Could anyone send an invite link to their discord server? It will be really helpful!

emgh · November 2024

@haste0675 said:
so much for HA

Can you show where they mention HA?

Petey_Long · November 2024

@wolfypro said:
Could anyone send an invite link to their discord server? It will be really helpful!

https://discord.gg/crunchbits

CoolGeek · November 2024

Discord says ETA for fix is approximately one hour from 9:30AM Pacific Time.

edrebe · November 2024

servers up

CoolGeek · November 2024

It's not all back. I have some VPSs working and some not still.

zox · November 2024

Finally, it is back.

suyadi92 · November 2024

@edrebe said:
servers up

Mine too (at least for now)

yoursunny · November 2024

Blame @FatGrizzly .
https://crunchbits.rip/

zox · November 2024

Start Time: 6 Nov 19:52
End Time: 6 Nov 23:30
Down: 3hr 38min
Time is shown in UTC+05:30

suyadi92 · November 2024

@yoursunny said:
Blame @FatGrizzly .
https://crunchbits.rip/

😂😂😂

suyadi92 · November 2024

@CoolGeek said:
It's not all back. I have some VPSs working and some not still.

From their discord:

The network is back online with limited capacity after the initial backup line didn't fully resolve the issue. We've switched to a different line card and we are working on restoring the full capacity. Thank you for your patience. More updates to follow.

melp57 · November 2024

I'm back up

suyadi92 · November 2024

@melp57 said:
I'm back up

Seems like Spokane was fully restored
https://uptime.crunchbits.net/status/public

servers_guru · November 2024

up!

CoolGeek · November 2024

I seem to be all working again!

yoursunny · November 2024

@suyadi92 said:
Seems like Spokane was fully restored

Are you sure?
BGP session is dead.

sunny@vps6:~$ sudo birdc s p a CRUNCHBITS_AS400304_v6�������
BIRD 2.0.12 ready.
Name       Proto      Table      State  Since         Info
CRUNCHBITS_AS400304_v6 BGP        ---        start  2024-11-06 21:05:18  Connect
  BGP state:          Connect
    Neighbor address: 2606:a8c0:0:2::19
    Neighbor AS:      400304
    Local AS:         200690
  Channel ipv6
    State:          DOWN
    Table:          master6
    Preference:     100
    Input filter:   (unnamed)
    Output filter:  (unnamed)
    Import limit:   300000
      Action:       disable
    Export limit:   16
      Action:       disable
    IGP IPv6 table: master6

jsg · November 2024

I'm glad to see that @crunchbits seems to be fully (or almost fully, see @yoursunny 's post) recovered and operational again.

And I hope the kind and generous help crunchbits provided to another provider in trouble is not forgotten - certainly not by me. Being there, also a friendly shoutout to @jar.

default · November 2024

My service is back - Thank you @crunchbits - Respect.

suyadi92 · November 2024

@yoursunny said:

@suyadi92 said:
Seems like Spokane was fully restored

Are you sure?
BGP session is dead.

Didn't check the BGP session, only from uptime status page.

spitball · November 2024

@yoursunny said:
Blame @FatGrizzly .
https://crunchbits.rip/

L O L

gg crunchbits, Slept through the outage and now I'm on discord ready for the BF @everyone.
@yoursunny made me log in and everything, posting that domain

I'm only 2 for 3 now, that the outage on my idler was a 'datacenter fire' thanks to this cursed website (OVH '21, EWR '23), so I GUESS I'll keep renewing at crunchbits (for the simple custom image setup)
Thanks again for the excellent communication, and no t-shirt! The new pipes feel smoooth

ElChile · November 2024

Is IPv6 inbound working for others?
nvm

CheepCluck · November 2024

@haste0675 said:
so much for HA

Real men setup HA on nested vms on the same vm.

yoursunny · November 2024

BGP session is up again.
IPv6 inbound is working on provider-assigned IP range but not working on BGP-announced IP range.

crunchbits · November 2024

@FAT32 said:
My brain: Don't say it, don't say it
Me: The end is...

Too soon, junior

@Allay said:
Reason I think they may be shutting down is that everything's been out of stock for a while, their sales hasn't replied to me in over a month either

Stuff is out of stock because there are big changes (and I hope: well received specials) coming and I don't want people buying something only to be (rightfully) upset 2-4 weeks later that X or Y is available and they just bought Z. I also don't want my team doing a bunch of extra work to service change requests on week old deployments. Toss in a surge of existing customer custom orders/growth and my personal preference to service and take care of the ones who helped us grow before onboarding new customers. I wish I could do both simultaneously, but I just can't reasonably meet those needs yet. Additionally, after getting sick pulling ~1hr per night sleep for too many days straight I realized we (myself and my entire team) have been pushing too hard for too long and need to have a plan to reasonably continue operations being mindful of mental and physical health. Luckily this is pretty easily achievable and we already have some internal task items to deal with it, but just to shed some light on why new sales/onboarding have been taking a backseat.

Plus, I don't like not being able to interact with friends and customers (discord, LET, etc) more often in an 'unofficial capacity'. It gets too cold and corporate, and I do value a lot of the input we receive from regular chats about what everyone is doing with their hardware. You wouldn't believe how often someone says they're doing XYZ and then it clicks for us that we have a certain type of hardware that isn't a standard product but we could deploy for their use-case and save them money every month. In my opinion, that kind of attention to detail is one of the biggest reasons to go with a small provider like you'll find on LET.

Also: I think we quoted you like 2 or 3 build requests already, unless I'm mixing you up with someone else (very possible, there are email threads that look to be the same person but might not be).

@suyadi92 said:

Crunchbits website are also down

Unfortunately, we thought we removed all dependencies on third party stuff not loading breaking the website a few months ago. Either a commit was mistakenly reverted or we missed something--but our blog going down caused the whole website (which is hosted offsite and redundant in an effort to keep website/billing/discord/ip phones all separately available avenues to reach us) to 500 out. Embarrassing, honestly. Completely my fault as well, as we have these little admin things to sniff out, sort out, test properly, and fix up but I have not been giving everyone enough time to really follow through on those tasks and do follow-ups.

@haste0675 said:
so much for HA

HA as far as services you buy from us? None are, none were ever sold to you as HA. That is something we've been investigating offering with new product stack, but frankly I just don't think it would be viable here because pricing to do it properly is a multiple of LE-preferred ranges. You're honestly better off just buying a much less expensive VM from us and any of the other solid providers here to make your own HA/replication stack for significantly less money. The bonus is you also get geographical diversity in case of nukes.

@servers_guru said:
up!

@suyadi92 said:

@CoolGeek said:
It's not all back. I have some VPSs working and some not still.

From their discord:

The network is back online with limited capacity after the initial backup line didn't fully resolve the issue. We've switched to a different line card and we are working on restoring the full capacity. Thank you for your patience. More updates to follow.

The issue ended up being somewhat complex. We've been relying on a Juniper MX chassis with redundant RE's, line cards, PSUs, etc for service windows and redundancy in our edge routing. Don't quote me exactly on this as networking is not my specialty, but we had one of our MPC's fail (so the early quick/easy fixes didn't help: wasn't optics, carrier, fiber, power-related, etc). Unfortunately how this failed ended up causing multiple issues within the entire chassis and the difficult decision was made to just move to the newer edge routers immediately as the uncertainty around time/parts/confidence of repair on the MX was enough to bite the bullet and put in the new edge units ahead of schedule. Luckily they had been pre-configured a week prior as we were already prepping to quietly roll services over to a new stack seamlessly. Unluckily this happened at around 6AM PST after most of the team was up to 2-5AM due to big elections and we've been slowly moving things around behind the scenes to prep for the upgrade and additional rack space. Frankly: caught with our pants down, textbook Murphy's law.

There has been a plan that was already in motion to slowly upgrade the entire WA network to match the capacity/equipment/design that we recently switched PA over to and are very happy with. It will allow us the ability to add a ton of beneficial features for new deployments and also add them for existing customers at no charge if they want to incur a maintenance window.

I'm truly sorry to our customers for the downtime, hassle, and failures on my part and over this weekend we'll be implementing some large changes (starting internally) to alleviate this to the best of our ability from lessons learned the hard way this morning. Eligible services (meaning: not you with a yearly) are also getting a bump in SLA credits for this event.

melp57 · November 2024

Crunchbits you guys rock! Give us some more 3yr vps's at a fantastic price like you have in the past. Not asking you to go broke. 😁

Howdy, Stranger!

Categories

In this Discussion

crunchbits outage?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

crunchbits outage?

Comments