16+ Hours Downtime with Racknerd?
First post here. So I bought the $35/year deal with Racknerd and I mostly host open-source tools we use internally for project management, team communication...etc.
On Feb 5, 6:48 PM EET betteruptime.com logged a timeout error and the server proceeded to give timeouts and error 502s. This continued till Feb 5, 10:58 AM EET.
From what I gathered from support, this was some sort of hardware failure at the Dallas datacenter which they needed to replace.
I started a ticket with support with #655164 and they denied that there was anything going on at first and told me to "check again now", nothing had changed.
After the incident was resolved I emailed Dustin to report what happened (3 days ago) and still no reply from him to even acknowledge the issue. Concerning.
The most frustrating issues:
1) It didn't seem like racknerd is even monitoring the uptime of its nodes. After over 45 minutes of connectivity issues, I reported the downtime and I went back and forth with support who were still trying to convince me that the server is up by sharing "ping" results even though everything was timing out and I couldn't even SSH in. Assuming no one reported the issue in the first place, would the server have stayed down forever?
2) The issue took an insane - pretty much never seen before with any of my other hosting providers - 16 hours to get sorted. I mean, one could argue that you shouldn't expect much from a $35/year deal but come on, 16 hours?
3) The statuses on the status page do not help one bit. The most important aspect to share in a time like this is ETA for resolution. That was non-existent. When I tried to get it from support, they just told me to follow the status page. Great.
4) Even when the node was back up less than 30 minutes ago as of writing this email, the status page doesn't even show an update that the node is back. So even posting an update about the resolution of the incident is not a straightforward thing to do apparently.
5) Dustin did not respond after 3 days of sending in the email.
Now I'm new to the community and maybe my expectations are too high? I mean if this is to be expected from a deal that cheap then that's obviously on me. I emailed Dustin the above and also to genuinely ask whether this is something I should expect to happen moving forward.
If there's an argument to be made on the premise of "what do you expect from a $3/month deal bro" then I guess this is all on me. But if this isn't what you'd usually expect, then I find this whole experience pretty frustrating to be honest.
Interested in hearing everyone's thoughts. Did you experience similar downtimes with them? Do you use their deals for anything mission-critical (or remotely close to that)? Are my expectations unreasonable?
Thanks a lot for taking the time to read this and share your thoughts.
what do you expect from a $3/month deal bro
Fair enough if that's the general consensus.
Let me ask you this though if you don't mind answering:
1) What would be your use case for this deal?
2) For similar specs, what would be a "good deal" for you for something that would be more reliable?
Maybe he's in court!!?? He's really busy these days and his busyness will doubled soon
Seems like there's some drama going on that I'm not aware of. Care to share a link to enlighten me?
Never had any downtime with RN for the past at least 2 years
I got several servers some even years old and never had any downtimes. And what is this $35 deal you are talking about? Can’t find it anywhere on their deals page.
1.5 GB Ryzen VPS
1x AMD Ryzen CPU Core
22 GB NVMe SSD Storage
1.5 GB DDR4 RAM
3000GB Monthly Premium Bandwidth
1Gbps Public Network Port
Full Root Admin Access
1 Dedicated IPv4 Address
KVM / SolusVM Control Panel - Reboot, Reinstall, Manage rDNS, & much more
LOCATION Los Angeles
@akkari where exactly ?
I have a VPS in LA and in 2 years never down
I have a 4G Ryzen in San Joes. It does have downtime from time to time.
The normal KVM seems fine.
No downtime with one Racknerd VPS in San Jose, since July 2022.
Support has been very fast to respond. My experience is limited to the initial setup tickets (mount .iso, request change of second IP address) in July 2022 and one minor question in November. Response time has been always within a few minutes, which is very good.
They don't always read your ticket carefully enough. My feeling is that English is a second language for many of the customer support people at Racknerd. In fairness, customer support language issues are common for many providers. Write plainly and clearly.
There was one unusual incident in a ticket. In the middle of the ticket "conversation" with support last November, one of the support people suddenly "suspended" my VPS and accused me of trying to run an illegal version of Windows OS on a "Linux VPS only". The accusation was untrue - I ran only Linux. Within a minute, another note appeared from the same person to "Please dis agree the previous reply." (They meant "disregard."). Nothing actually happened and the VPS continued running. My question was answered.
I am still enjoying my Racknerd VPS. It is a great value at a very competitive price. Like many, I am interested in the court case and how it may affect Racknerd's business. I will continue to enjoy the Racknerd VPS while waiting for it to resolve.
Pay peanuts get monkeys.
They guarantee 100% power uptime only: https://blog.racknerd.com/how-does-racknerd-guarantee-100-power-uptime/
Go with AWS Lightsail, Vultr or Hetzner if production.
Yes that comes close.
Have had 100% uptime across all of my Hetzner VMs for 2 years but I know what you mean. That's why its in third place for me.
It was this one, the 3.5 GB KVM from the July 2022 deals: https://lowendbox.com/blog/racknerd-1-gb-ram-kvm-vps-14-98-year-and-more-available-in-multiple-locations/
Honestly, this was the first downtime I've recorded but it was just insanely long and support was just not helpful. Support was pretty fast to respond, to be honest, they just weren't helpful.
Thank you for the recommendations. I've been with Vultr for quite a while and almost never had issues. Was just trying to save a few bucks but that wasn't very smart apparently.
Seems most of you have little issues. Anyone in the Dallas data center? That's where I am.
Vultr is prem.
Hi @akkari -- Thank You so much for being our valued customer -- I see that you've been our customer for nearly a year now, and have deployed multiple services with us throughout this time.
First - I sincerely apologize for not seeing your direct email till now. I was able to find your email now after looking up your ticket ID then searching your email -- it looks like certain keywords within your email got caught in one of my Google filters (we utilize Google workspace for our company emails). I'm looking into fixing/optimizing that filter rule now.
Onto the actual matter itself - I can confirm that your VPS is hosted on DAL124KVM, which over the weekend did face an extended outage on this one node in Dallas due to hardware issues. We are fully transparent when it comes to any and all outages. We update our status page very actively in that regard, in that the minute (or several minutes at the most) after we're aware of an outage or the slightest issue, we update our status page located at https://status.racknerd.com/
In this case, a status incident was created over the weekend for DAL124KVM, which you can find a direct link for here: https://status.racknerd.com/incident/896 -- the hardware issue took a bit longer than usual to resolve as we had to run some offline hardware tests, and when we replaced certain hardware components, the issue still persisted. Eventually, we learned it was a bad motherboard causing such behavior, and replaced the motherboard entirely in order to resolve the issue.
Although I won't sit here and say we are perfect (we're all human after all) - we do our best to provide an honest and reliable service, and overall our reputation as a company does reflect excellent reliability, uptime, and support. I'd like to reiterate this particular issue affected a single physical KVM VPS node of ours out of Dallas only. Your recent experience was unfortunate, and to be fully transparent, I can count only on one hand incidents that have lasted this long over the many years we've been in business. Widespread or extended outages simply don't happen much here, so once again our sincerest apologies. I'm confident things will be stable here on out with regards to DAL124KVM now that we did a complete motherboard replacement
If you ever have any other concerns, feel free to reach out. I know that downtime is never fun, and that is something that definitely isn't the norm here. I'll be responding to your email shortly with my contact info as well and we can go from there. Thank You again for being our customer.
UPDATEneed a timestamp...
also there are TWO issues for DAL124KVM on same day https://status.racknerd.com/incident/894 and https://status.racknerd.com/incident/896 - if someone bookmarked the first page then surely he will claim "no updates" - incident should continue in the same (or update and link to new one?) incident-ticket, shouldn't it?
@JabJab -- we are looking into how we can issue updates while having the software automatically input a timestamp next to that update. When updating the status page, our techs generally tend to update it as quickly and concisely as possible (while at the same time working on the issue at hand, as well as responding to tickets that arise from any such incident), so internally we try to make it very streamlined/simple for our techs to update our status page, so that customers are kept in the loop while at the same time not prolonging/delaying the actual resolution itself. It hasn't been much of an issue as most of our incidents don't require any additional updates beyond "fixed" (extended/widespread outages don't happen much), but overall I do agree with you on this point, I'm going to look into that myself as well.
Regarding the two incidents for DAL124KVM, that is correct, the initial outage for DAL124KVM was a quick reboot after we noticed the node became unresponsive, and shortly after we issued another status incident for this node after we identified there was a potential hardware problem that needed to be looked into.
Every web service is going to have downs times, one time or another, shit happen, remember, machine do fail, but the provider needs to be upfront and let the customers knows what the hell is going on!
... your downtime has been doubled?
Some reassurance was all I needed to hear especially because this was the first incident I had with you guys. Your reply and transparency hit the nail on the head in my opinion. You've most certainly won me over again.
I do hope you address a few points moving forward:
1) When I got in touch with support we were at least 45 minutes into the downtime and they were still unaware. Perhaps that's because your monitoring systems only test nodes via pings and not an HTTP request or any other method that would be more appropriate to test actual responsiveness. We had to go back and forth before they even realized there was an underlying issue, so it was troubling that they seemed completely unaware until I reported it and insisted there was an issue.
I hope your monitoring systems are improved moving forward so that when a whole node becomes unresponsive your team can be on it before any customer needs to file any report. This would have shaved an hour at the very least off the downtime and would give us as customers the peace of mind that if a node is down you're probably already aware and aren't dependent on a customer reporting it.
2) After your reply I was thinking about moving over more stuff to you guys, but a few members mentioned lawsuits in the replies and that got me worried. Not sure if you can address that directly at this point but I hope that once that is settled, you'd announce it went whether here or via email to all customers...etc. Surely an ongoing lawsuit would deter many people from bringing in new business for you.
Appreciate the email you've sent, very helpful. Will reply in a few minutes.
I agree, I not just because of the downtime but communication wasn't the best at all. However, Dustin's latest reply puts the vast majority of my concerns to rest.
Hi @akkari -- thank you for the reply and I am happy to hear that you appreciate the transparency. With regards to the monitoring systems, our top priority is to ensure that any issues are detected and resolved quickly. To that end, we have a comprehensive monitoring setup that combines various tools such as ping, server agent services (such as Hetrixtools), and custom in-house development such as Grafana dashboards. I understand that the recent incident with DAL124KVM was an extreme exception, and I will work with our development team to refine our Grafana dashboards to better account for such scenarios in the future.
Our operations are truly 24x7, and as a global infrastructure provider, we sometimes need to rely on third-party physical datacenter hands to perform physical work, which is a common practice among providers who have presence across multiple datacenter locations. Despite this, we continuously work towards enhancing our operations and processes to offer top-notch service to our customers. We understand the importance of quick resolution, and as we continue to add more support staff to our team, in the past six to nine months we have also made extra effort internally to further bolster our internal documentation/processes/training, so that we are always able to deliver the end result our customers have come to know and expect of us. And with that being said, we embrace the need for constant refinement in our processes and operations - I am a firm believer that you can never stop growing, there is always room for improvement and new ideas
With regards to #2, we value your business and are fully committed to providing the best possible service to our customers. I would like to assure you that this matter is not related to our services, nor does it affect RackNerd's operations. While the information about this matter is public and available for you to follow, please understand that I cannot make any further comments on it at this time. However, I would like to assure you that we are fully dedicated to not only earning, but also retaining the business of our valued customers, and will continue to work hard towards this goal.
I am personally committed to helping our customers and interacting with our community. Despite having thousands of physical servers under our footprint globally, and 24x7 support staff coverage, I still personally enjoy getting to know our customers on a first-name basis, and I am always available to assist in any way I can. Our company has an actual physical office space, and although there's nothing wrong with providers who don't have an office space, I'd say that we are one of the few within this community who actually do and show up everyday, which is a testament to our commitment to our business and our customers. Thank You for being part of our journey, and for trusting us with your business. I know your one year anniversary mark with us is coming right around the corner, and so we look forward to many more together
I think cachet does support the timestamps, but you gotta post an update(typically known as incident on the dashboard) instead of editing an incident(atleast how it worked for me in the past), or just migrate to uptimekuma like i did and get the updated timestamp with really lightweight interface for free. My uptimekuma instance is running stable at 140mb average usage on fly.io for over a month, no issues detected so far.
Thank You for that @TrK -- looking into and considering both of these options...
There was also statping but that kinda died an untimely death and with there being many alternatives no one really give it a try but it had an awesome thing which kinda missing from all the self hosted alts, "Mobile" app support, there was also a forked version named statping-ng but not sure it is any better than uptime kuma, guess i will try it on docker first before trying to push it to fly.io.
There may be lawsuits, but I do not know about them. The real interest is in the criminal cases. Here are the facts as I understand them, copied from two of my posts in the long thread (see below).
Dustin Cisneros, the founder and CEO of Racknerd, is currently charged with several criminal felonies: embezzlement and larceny (grand theft). He is also accused (by members of LowEndTalk) of involvement in a similar hosting business that suddenly vanished without notice, leaving customers cut off with no access to their prepaid servers. It happened without refunds, without apology, and without any notice or communications.
There were hearings/meetings in August and November 2022 and a few days ago on 2 February 2023 to discuss the case in the courtroom with the judge and prosecutor. I assume that Dustin and his attorney were also present at those meetings. The next courtroom hearing/meeting will be on 27 April 2023.
Here is a very very long thread. Scattered throughout that thread are many facts and details along with a lot of personal comments. Good luck finding the facts and details among the comments:
Not really surprising to hear. I never have extended downtime with my racknerd VPS's, but constant packet loss and connection dropping for 5-30 seconds several times an hour, 24/7. Granted, I've seen similar trash networks from some of the other cheaper providers like Virmach.