New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Comments
On their discord:
The network engineering team has identified the issue as a line card failure, which is affecting the main upstream connection. They are actively working on resolving it. We’ll share an estimated time for repair (ETR) as soon as possible. Thank you for your patience.
so much for HA
lol you really thought Crunchbits offering HA service? born yesterday?
Not to mention I don't think this is a situation where HA is effective, a more effective approach might be to implement a K3s in Seattle and their WA to implement failover, on a business level.
Could anyone send an invite link to their discord server? It will be really helpful!
Can you show where they mention HA?
https://discord.gg/crunchbits
Discord says ETA for fix is approximately one hour from 9:30AM Pacific Time.
servers up

It's not all back. I have some VPSs working and some not still.
Finally, it is back.
Mine too (at least for now)
Blame @FatGrizzly .
https://crunchbits.rip/
Start Time: 6 Nov 19:52
End Time: 6 Nov 23:30
Down: 3hr 38min
Time is shown in UTC+05:30
😂😂😂
From their discord:
The network is back online with limited capacity after the initial backup line didn't fully resolve the issue. We've switched to a different line card and we are working on restoring the full capacity. Thank you for your patience. More updates to follow.
I'm back up
Seems like Spokane was fully restored
https://uptime.crunchbits.net/status/public
up!
I seem to be all working again!
Are you sure?
BGP session is dead.
I'm glad to see that @crunchbits seems to be fully (or almost fully, see @yoursunny 's post) recovered and operational again.
And I hope the kind and generous help crunchbits provided to another provider in trouble is not forgotten - certainly not by me. Being there, also a friendly shoutout to @jar.
My service is back - Thank you @crunchbits - Respect.
Didn't check the BGP session, only from uptime status page.
L O L
gg crunchbits, Slept through the outage and now I'm on discord ready for the BF @everyone.
@yoursunny made me log in and everything, posting that domain
I'm only 2 for 3 now, that the outage on my idler was a 'datacenter fire' thanks to this cursed website (OVH '21, EWR '23), so I GUESS I'll keep renewing at crunchbits (for the simple custom image setup)
Thanks again for the excellent communication, and no t-shirt! The new pipes feel smoooth
Is IPv6 inbound working for others?
nvm
Real men setup HA on nested vms on the same vm.
BGP session is up again.
IPv6 inbound is working on provider-assigned IP range but not working on BGP-announced IP range.
Too soon, junior
Stuff is out of stock because there are big changes (and I hope: well received specials) coming and I don't want people buying something only to be (rightfully) upset 2-4 weeks later that X or Y is available and they just bought Z. I also don't want my team doing a bunch of extra work to service change requests on week old deployments. Toss in a surge of existing customer custom orders/growth and my personal preference to service and take care of the ones who helped us grow before onboarding new customers. I wish I could do both simultaneously, but I just can't reasonably meet those needs yet. Additionally, after getting sick pulling ~1hr per night sleep for too many days straight I realized we (myself and my entire team) have been pushing too hard for too long and need to have a plan to reasonably continue operations being mindful of mental and physical health. Luckily this is pretty easily achievable and we already have some internal task items to deal with it, but just to shed some light on why new sales/onboarding have been taking a backseat.
Plus, I don't like not being able to interact with friends and customers (discord, LET, etc) more often in an 'unofficial capacity'. It gets too cold and corporate, and I do value a lot of the input we receive from regular chats about what everyone is doing with their hardware. You wouldn't believe how often someone says they're doing XYZ and then it clicks for us that we have a certain type of hardware that isn't a standard product but we could deploy for their use-case and save them money every month. In my opinion, that kind of attention to detail is one of the biggest reasons to go with a small provider like you'll find on LET.
Also: I think we quoted you like 2 or 3 build requests already, unless I'm mixing you up with someone else (very possible, there are email threads that look to be the same person but might not be).
Unfortunately, we thought we removed all dependencies on third party stuff not loading breaking the website a few months ago. Either a commit was mistakenly reverted or we missed something--but our blog going down caused the whole website (which is hosted offsite and redundant in an effort to keep website/billing/discord/ip phones all separately available avenues to reach us) to 500 out. Embarrassing, honestly. Completely my fault as well, as we have these little admin things to sniff out, sort out, test properly, and fix up but I have not been giving everyone enough time to really follow through on those tasks and do follow-ups.
HA as far as services you buy from us? None are, none were ever sold to you as HA. That is something we've been investigating offering with new product stack, but frankly I just don't think it would be viable here because pricing to do it properly is a multiple of LE-preferred ranges. You're honestly better off just buying a much less expensive VM from us and any of the other solid providers here to make your own HA/replication stack for significantly less money. The bonus is you also get geographical diversity in case of nukes.
The issue ended up being somewhat complex. We've been relying on a Juniper MX chassis with redundant RE's, line cards, PSUs, etc for service windows and redundancy in our edge routing. Don't quote me exactly on this as networking is not my specialty, but we had one of our MPC's fail (so the early quick/easy fixes didn't help: wasn't optics, carrier, fiber, power-related, etc). Unfortunately how this failed ended up causing multiple issues within the entire chassis and the difficult decision was made to just move to the newer edge routers immediately as the uncertainty around time/parts/confidence of repair on the MX was enough to bite the bullet and put in the new edge units ahead of schedule. Luckily they had been pre-configured a week prior as we were already prepping to quietly roll services over to a new stack seamlessly. Unluckily this happened at around 6AM PST after most of the team was up to 2-5AM due to big elections and we've been slowly moving things around behind the scenes to prep for the upgrade and additional rack space. Frankly: caught with our pants down, textbook Murphy's law.
There has been a plan that was already in motion to slowly upgrade the entire WA network to match the capacity/equipment/design that we recently switched PA over to and are very happy with. It will allow us the ability to add a ton of beneficial features for new deployments and also add them for existing customers at no charge if they want to incur a maintenance window.
I'm truly sorry to our customers for the downtime, hassle, and failures on my part and over this weekend we'll be implementing some large changes (starting internally) to alleviate this to the best of our ability from lessons learned the hard way this morning. Eligible services (meaning: not you with a yearly) are also getting a bump in SLA credits for this event.
Crunchbits you guys rock! Give us some more 3yr vps's at a fantastic price like you have in the past. Not asking you to go broke. 😁