Help understanding Racknerd reducing my cores from 4 to 1.

Kmaid · August 2023

I'm relatively new to server administration, and I've been managing a couple of VMs with Racknerd. However, this morning, I received a notification about downtime. Upon checking my emails, I learned that my CPU core count had been reduced to 1 due to overutilization.

When I accessed my grafana account and reviewed the host information, I came across this screenshot: https://imgur.com/a/rZyCaq5. It's evident that around midnight, there was a sudden spike in CPU Steal, reaching 40%. I believe this might have been caused by watchtower updating containers and quickly returning to normal levels. Over the past two weeks, my load average stayed around 15%, significantly below the maximum 30% utilization mentioned in the email.

I'm unsure whether I'm unintentionally straining their system or if this was a momentary excessive usage. Could this situation have arisen because other users sharing the same machine also experienced usage spikes simultaneously, making it challenging to pinpoint which VM was primarily responsible? Any advice on how to avoid this in the future? I asked support who just enabled my cores again without providing me much help to avoid this other than monitor my system which I am already doing

Thanks

Kmaid · August 2023

Its just been suspended again grafana node exporter reporting 80% usage with 7.5% steal. #HG06098 and #XM36353. Could you help me @dustinc

AlexBarakov · August 2023

Has your bandwidth been doubled?

Kmaid · August 2023

it has

hyperblast · August 2023

Has your cores been quattroed?

Kmaid · August 2023

@hyperblast said:
Has your cores been quattroed?

They renabled the cores again but they don't seem to want to say anything more than use htop. I am trying this to see if node exporter is not doing a good enough job of reporting for some reason

babywhale · August 2023

use top command to see what is using all that cpu usage. htop will also help with that

JabJab · August 2023

@babywhale said:
use top command to see what is using all that cpu usage. htop will also help with that

Hi RackNerd support agent.

Don_Keedic · August 2023

I'd like to see that email. Never heard of a company physically reducing something vs throttling.

dev_vps · August 2023

@Kmaid said:
it has

You may want to consider VPS with dedicated vCores

Kmaid · August 2023

@Don_Keedic said:
I'd like to see that email. Never heard of a company physically reducing something vs throttling.

The email:

Hello,

It has come to our attention that your VPS is consuming a consistent amount of CPU resources, in a manner that far exceeds any definition of fair share. In order to prevent this from impacting neighboring customers that reside upon the same host node/hypervisor as your VPS, we need you to investigate this as soon as possible, or to upgrade to a Dedicated Server where you are not sharing Disk I/O or CPU resources with neighbors on the same node (a dedicated server would provide you with a physical server to yourself, where you can maximize all resources including CPU at 100% at all times if desired). With a VPS environment, you are sharing CPU and Disk I/O resources with others, so we do ask VPS customers to respect that and utilize our service in a fair usage manner.

Please note that in order to maintain a stable platform, we have temporarily reduced your VPS's CPU core allocation. Once a plan is in place to resolve the CPU spike, we will be happy to restore your VPS's original number of cores. Can you please tell us more details about the services you're running within this VM so we can provide some optimization tips?

Thank You,

Kmaid · August 2023

@dev_vps said:

@Kmaid said:
it has

You may want to consider VPS with dedicated vCores

@dev_vps said:

@Kmaid said:
it has

You may want to consider VPS with dedicated vCores

Yeah, I have been attempting to look for hybrid solutions on ServerHunter to try and find inexpensive VPS's with dedicated CPUs but not having much luck. Any recommendations? After being restricted twice in a few hours and only getting canned responses not directly answering my questions I am getting itchy feet. I have otherwise had a very good experience with Racknerd and im 10 months in

DeadlyChemist · August 2023

annoying but fair, imo
im based anyways so yeah dont listen to me

dev_vps · August 2023

@Kmaid
How much is your budget

babywhale · August 2023

i have had that happen before where they will limit your cpu usage to 30% if your using more then the fair share cpu usage policy is. my advise is to see what you can do to try and reduce the cpu usage. maybe a runaway program eating up to much resources etc..

Don_Keedic · August 2023

@Kmaid said:

That's crazy. I understand them needing to throttle it but to physically change something (which some configurations may depend on) seems extremely heavy handed.

I don't use Racknerd for squat, won't ever use Racknerd for squat.

Recommendation wise, Crunchbits has a great lineup of VDS. What you see is what you get, no questions asked. I recently switched my dedicated from another provider over to a Crunchbits VDS and it's been perfect.

https://crunchbits.com/vds#Plans

crunchbits · August 2023

@Don_Keedic said:

@Kmaid said:

Appreciate the mention. The VDS lineup is specifically exactly for your current issues @Kmaid, although the usage stats you are showing would also be fine on our SSD or NVMe VPSes. There is still a live coupon for -30% on the entire Xeon VDS lineup.

dustinc · August 2023

Hi @Kmaid -- First off, thank you so much for choosing RackNerd as your provider (for almost a year by now). We sincerely appreciate your business and trust.

In the spirit of complete transparency and to provide additional background here, I've got to tell you that incidents like these where we have to act on a VPS’s CPU utilization are quite rare in general. At the same time, it's important for us to ensure that one client's activities, whether on a single or multiple VMs, do not adversely affect the experience of their neighbors on the same physical server. While such instances are rare, we take immense pride in the level of service we provide, and our goal is to offer an optimal experience for every customer. We swing into action only when our monitoring systems trigger a high-load alert on a specific node. From there, we dig in to identify which VPS is drawing excessive, sustained CPU utilization on that particular node over a certain period. I also want to emphasize that given how rare this is to begin with - we don't rely on automation for any of this. Each case is manually reviewed by one of our team members at the KVM VPS host node level. We’ve found that helps eliminate false positives, which is a benefit given the rare nature of these events.

About the htop recommendation from our support team — it's more a constraint of the nature of unmanaged services more than anything else. We can't look inside your VPS to see what's gobbling up CPU without your root password. That's why our advice is often high-level, focusing on what we can see from the host node level. In other words - from our perspective, we can’t see the individual processes that you’re running within your VPS - we can only see a total CPU % utilized based on the KVM VM ID. Your situation here, based on what I’ve read - sounds like it could be due to some unintended factors, such as an errant process or a misbehaving application, and it's tough for us to pinpoint without that inside look. Also, as an example - we've seen instances in the past where end-users were not knowingly running such rogue processes (which indicates a possible compromised VM).

Since you've been with us for more than 10 months without any hiccups, my thoughts so far is that this is likely one of those isolated incidents mentioned above, perhaps a runaway process or something along those lines. With that being said, as a proposed resolution path (despite the unmanaged nature of your service), I'm willing to go the extra mile. Shoot me an email at [email protected], and we can arrange for temporary access to your VPS. I will have one of our senior systems administrators take a look. From there, we will have a better idea of what's going on and can provide more tailored advice.

I greatly appreciate your time and look forward to sorting this out with you, thanks!

fluffernutter · August 2023

@dustinc said:
Hi @Kmaid -- First off, thank you so much for choosing RackNerd as your provider (for almost a year by now). We sincerely appreciate your business and trust.

In the spirit of complete transparency and to provide additional background here, I've got to tell you that incidents like these where we have to act on a VPS’s CPU utilization are quite rare in general. At the same time, it's important for us to ensure that one client's activities, whether on a single or multiple VMs, do not adversely affect the experience of their neighbors on the same physical server. While such instances are rare, we take immense pride in the level of service we provide, and our goal is to offer an optimal experience for every customer. We swing into action only when our monitoring systems trigger a high-load alert on a specific node. From there, we dig in to identify which VPS is drawing excessive, sustained CPU utilization on that particular node over a certain period. I also want to emphasize that given how rare this is to begin with - we don't rely on automation for any of this. Each case is manually reviewed by one of our team members at the KVM VPS host node level. We’ve found that helps eliminate false positives, which is a benefit given the rare nature of these events.

About the htop recommendation from our support team — it's more a constraint of the nature of unmanaged services more than anything else. We can't look inside your VPS to see what's gobbling up CPU without your root password. That's why our advice is often high-level, focusing on what we can see from the host node level. In other words - from our perspective, we can’t see the individual processes that you’re running within your VPS - we can only see a total CPU % utilized based on the KVM VM ID. Your situation here, based on what I’ve read - sounds like it could be due to some unintended factors, such as an errant process or a misbehaving application, and it's tough for us to pinpoint without that inside look. Also, as an example - we've seen instances in the past where end-users were not knowingly running such rogue processes (which indicates a possible compromised VM).

Since you've been with us for more than 10 months without any hiccups, my thoughts so far is that this is likely one of those isolated incidents mentioned above, perhaps a runaway process or something along those lines. With that being said, as a proposed resolution path (despite the unmanaged nature of your service), I'm willing to go the extra mile. Shoot me an email at [email protected], and we can arrange for temporary access to your VPS. I will have one of our senior systems administrators take a look. From there, we will have a better idea of what's going on and can provide more tailored advice.

I greatly appreciate your time and look forward to sorting this out with you, thanks!

Why change the configuration of the VM instead of just capping the CPU to whatever your FUP states? Seems extremely heavy handed.

ivlad · August 2023

What Dustin sent above is the exact reason why I will stick with Racknerd. Good job; you have repeatedly gone above and beyond for your customers. I hope the OP takes advantage of their free help.

dustinc · August 2023

@fluffernutter said:

@dustinc said:
Hi @Kmaid -- First off, thank you so much for choosing RackNerd as your provider (for almost a year by now). We sincerely appreciate your business and trust.

In the spirit of complete transparency and to provide additional background here, I've got to tell you that incidents like these where we have to act on a VPS’s CPU utilization are quite rare in general. At the same time, it's important for us to ensure that one client's activities, whether on a single or multiple VMs, do not adversely affect the experience of their neighbors on the same physical server. While such instances are rare, we take immense pride in the level of service we provide, and our goal is to offer an optimal experience for every customer. We swing into action only when our monitoring systems trigger a high-load alert on a specific node. From there, we dig in to identify which VPS is drawing excessive, sustained CPU utilization on that particular node over a certain period. I also want to emphasize that given how rare this is to begin with - we don't rely on automation for any of this. Each case is manually reviewed by one of our team members at the KVM VPS host node level. We’ve found that helps eliminate false positives, which is a benefit given the rare nature of these events.

About the htop recommendation from our support team — it's more a constraint of the nature of unmanaged services more than anything else. We can't look inside your VPS to see what's gobbling up CPU without your root password. That's why our advice is often high-level, focusing on what we can see from the host node level. In other words - from our perspective, we can’t see the individual processes that you’re running within your VPS - we can only see a total CPU % utilized based on the KVM VM ID. Your situation here, based on what I’ve read - sounds like it could be due to some unintended factors, such as an errant process or a misbehaving application, and it's tough for us to pinpoint without that inside look. Also, as an example - we've seen instances in the past where end-users were not knowingly running such rogue processes (which indicates a possible compromised VM).

Since you've been with us for more than 10 months without any hiccups, my thoughts so far is that this is likely one of those isolated incidents mentioned above, perhaps a runaway process or something along those lines. With that being said, as a proposed resolution path (despite the unmanaged nature of your service), I'm willing to go the extra mile. Shoot me an email at [email protected], and we can arrange for temporary access to your VPS. I will have one of our senior systems administrators take a look. From there, we will have a better idea of what's going on and can provide more tailored advice.

I greatly appreciate your time and look forward to sorting this out with you, thanks!

Why change the configuration of the VM instead of just capping the CPU to whatever your FUP states? Seems extremely heavy handed.

Hi @fluffernutter -- just to clarify, this isn't a daily routine for us (as mentioned - quite rare for a situation like this to surface to begin with). Therefore, we acknowledge that our current process may not be 100% perfect and are open to suggestions for future improvement. At the moment - everything is done on a case by case basis, and we usually notify clients first before anything, though that may not always be the case especially if the activity is particularly disruptive. In either case, we always communicate with our end-user (as we did here with the OP).

SolusVM with KVM virtualization doesn't have an efficient way to only limit frequency. We opt for a temporary core reduction (with communication) as a last resort -- we’ve found that this is still better than a suspension or stopping the VPS entirely. Then when the end user is in communication with us and expresses intent to resolve the matter - we restore back to the original cores right away.

dustinc · August 2023

@ivlad said:
What Dustin sent above is the exact reason why I will stick with Racknerd. Good job; you have repeatedly gone above and beyond for your customers. I hope the OP takes advantage of their free help.

Hi @ivlad -- Thank You so much for the kind words and your continued business. We’re not perfect, but what we do aim for is solid service, competitive pricing, and human customer support that's always available & reachable 24x7. We'll continue to refine our processes and continue doing our best for our customers

We look forward to continuing to work together for many years to come!

Howdy, Stranger!

Categories

In this Discussion

Help understanding Racknerd reducing my cores from 4 to 1.

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Help understanding Racknerd reducing my cores from 4 to 1.

Comments