CPU Abuse Notices from VirMach

MasonR · July 2018

Hey all,

I'll preface this post by saying that VirMach is a terrific provider and this thread is not meant to throw shade at them, just is intended to spark a healthy discussion on a possibly faulty system.

tl;dr: got a CPU abuse message, but VPS is mostly idle. Support insisted there is no chance their abuse detection system is wrong. Anyone else have same experience?

Long version:

I posted over on HostBalls about receiving a CPU abuse notice from VirMach and believing it to be a mistake on their part. To my surprise, I got quite a few replies from others saying they've gotten similar notices erroneously.

Here's a little background - this morning I received an automated email that one of my VPSes was using 240% CPU for many hours and to reduce CPU usage or I will have my service temporarily turned off. I monitor my array of VMs regularly (via live server stats) and was surprised to get this message. I immediately logged into the VPS that I was notified about and found the usage/load -

Low CPU usage and a 0.00 load avg as expected. I checked the access logs and there wasn't any suspicious logins. Root login is disabled and fail2ban is also installed. Really the only thing this VPS runs is a small TeamSpeak server, otherwise is idle as seen in the screencap above.

I replied to the ticket that there must be a mistake and that my VPS is barely using any resources. Level 3 support replies that, "this is a system generated message so it won't be wrong," then instructs me how to use the task manager to monitor CPU usage. I reply that I'm using Linux and attach the screenshot above, then get instructions on how to use top/htop to monitor CPU usage (even though the screenshot was a htop cap).

Side note - how does one reach 240% CPU usage with only 2 vCPUs?

How many of you guys have encountered similar issues with VirMach or other providers? Anyone have their VPS temporarily suspended even though they weren't using lots of resources?

qtwrk · July 2018

customer’s services cannot burst to 95-100% usage for more than 5 minutes so if your service may use more than that or needs longer periods of burst then you will need this add-on.
According to our terms and conditions, customer’s services cannot average higher than 50% usage within a 2 hour period

well , maybe you hit 100% for over 5 minutes , they have VERY stricted CPU policy.

I don't have virmach , but I was hit CPU limits on other provider , where I was compiling something that took hours , so after all my VPS CPU was limits to 400 MHz or something , luckily I didn't get suspended.

saibal · July 2018

I have received a notice for 114.8% CPU usage for multiple hours on a single vCPU VPS.

MasonR · July 2018

@qtwrk said:
well , maybe you hit 100% for over 5 minutes , they have VERY stricted CPU policy.

The message states that I was using "239.6% CPU for multiple hours." There a 0% chance that a small TeamSpeak server was using that kind of CPU at 6am.

doghouch · July 2018

@saibal said:
I have received a notice for 114.8% CPU usage for multiple hours on a single vCPU VPS.

“Yes sir you were using 2 cores on a one core VPS”

Harambe · July 2018

Had 1 VPS suspended for the same 'CPU abuse' reason - completely idle box. I have a feeling their monitoring is using system load, not CPU usage.

So if someone else hammers the I/O or whatever and it causes the load to spike in your VM - they'll auto suspend/send a notice.

AlyssaD · July 2018

Have you asked for logs of this high cpu usage?

Also, why not spin up an instance of librenms, observium, or something else. That way you have logs you can counter with. This is a huge reason why I monitor all my vms with Librenms. I know if something is amiss, not working, or running weirdly.

MasonR · July 2018

@AlyssaD said:
Have you asked for logs of this high cpu usage?

I have not. I doubt they have them since their system is always right, though :P

Control panel only has graphs for network traffic and i/o usage.

greattomeetyou · July 2018

MasonR said: "this is a system generated message so it won't be wrong,"

We are never wrong!

AlyssaD · July 2018

@MasonR said:

@AlyssaD said:
Have you asked for logs of this high cpu usage?

I have not. I doubt they have them since their system is always right, though :P

Control panel only has graphs for network traffic and i/o usage.

Ask them for logs, to prove it to you. Tell them you found zero things that would cause that high load for hours. Say you need extra info to help figure out what is going on.

defkev · July 2018

Same here with a SSD1G

First a "warning" on last Sunday

119.8% CPU for multiple hours

Today a shutdown, again (coincident?)

119.8% CPU for multiple hours

I do monitor all my boxes, high load trigger at 5% over 1/5/15mins and i have absolutely no idea what they are on about, especially not with the "multiple hours" part.

Either their anti-abuse system is flawed or they have oversold their stuff and try to get rid of people.

Whatever it is, its getting quite annoying, especially since the box has been up for half a year (setup and forget) and never gave me any problems.

AlyssaD · July 2018

More than likely it is a misconfigured server that is stuck on something.

AlyssaD · July 2018

Wait Wait Wait! How long is your /var/log/auth.log?

fail2ban searches through it every now and then to verify, add, and remove entries.

https://github.com/fail2ban/fail2ban/issues/1339

MasonR · July 2018

@AlyssaD said:
Wait Wait Wait! How long is your /var/log/auth.log?

fail2ban searches through it every now and then to verify, add, and remove entries.

https://github.com/fail2ban/fail2ban/issues/1339

Couple Megs. Hmm. Auth and fail2ban logs look normal around the time I got the message, though. And I've never experience this issue (cpu abuse notice) on any other VPS I have including LES machines, all running fail2ban. I am running Debian 8 (jessie) if that matters.

Chuck · July 2018

computer is never wrong. That why I would use self-driving car. I would believe when my domestic robot said that I was wrong.

AlyssaD · July 2018

@MasonR said:

@AlyssaD said:
Wait Wait Wait! How long is your /var/log/auth.log?

fail2ban searches through it every now and then to verify, add, and remove entries.

https://github.com/fail2ban/fail2ban/issues/1339

Couple Megs. Hmm. Auth and fail2ban logs look normal around the time I got the message, though. And I've never experience this issue (cpu abuse notice) on any other VPS I have including LES machines, all running fail2ban. I am running Debian 8 (jessie) if that matters.

The thing is if you auth.log is long it has to search through ever entry during certain times. It has pegged my CPU in the past.

MikeA · July 2018

Flawed system. Exact reason I manually check things before suspending KVM.

AnthonySmith · July 2018

My guess.

They are getting the info from top themselves which is completely unreliable for monitoring actual guest usage and one of the following is true:

1) You dont have virtio disk

2) You dont have virtio net adapter

3) Your CPU is QEMU-CPU

Which will result in them seeing the emulation overhead on your VPS from TOP which is not your VPS using it it is their host node using it against your qemu process.

How do you get 240% from a 2 vCPU system you ask? you have a broken monitoring system with bad logic that is how.

not sure if they have a rep here, but if they do they should seriously look into forcing cpu-passthrough and exact match along with virtio drivers when possible and monitor through something like atopsar or virt-top with scripted aggregation for actual use

Yura · July 2018

And nobody tagged him still.

@virmach

Harambe · July 2018

@AlyssaD said:

The thing is if you auth.log is long it has to search through ever entry during certain times. It has pegged my CPU in the past.

I've only had it for a short period at boot/after restarting fail2ban. With a gigantic auth.log you're still only talking minutes not hours.

defkev · July 2018

@AlyssaD said:
Wait Wait Wait! How long is your /var/log/auth.log?

No PasswordAuthentication, no fail2ban, no exposed services.

defkev · July 2018

Please update us with the root password for your VPS, we will have a look

Just got this from a "Tier 3 Technical Support Agent" after responding to the Shutdown ticket with another monitoring graph showing 99% avg idle over three days...

greattomeetyou · July 2018

When using VPS, how do you aggregate stats for your analysis purposes?

MasonR · July 2018

@greattomeetyou said:
When using VPS, how do you aggregate stats for your analysis purposes?

From the provider side or the client side?

I use https://github.com/BotoX/ServerStatus to get live server stats and plop it all on a status page. But you can also use HetrixTools or LibreNMS or something similar if you want to collect and store the usage history.

sundaymouse · July 2018

Don't buy cheap and use cheap.

florianb · July 2018

@MasonR CPU usage is usually calculated based on historical data from cpu time, this is divided and factored based on the amount of virtual machine CPUs, hostnode CPUs and their respective times. If your VM was to use massive amounts of CPU (i.e. a stress test) you can easily make those calculations return values of around 115% for a single core VM. A core in virtualisation doesn't literally mean it can only use 100%, that's simply sort of what's to be expected by the cpu time split based on the VMs cores.

JohnMiller92 · July 2018

Noob question, but how does CPU usage go above 100%? For example, I'm on a i7 2600 and if I use more cores+threads, I will never get past 100%

edit: I just saw Anthony's answer btw, does help a bit thanks.

MasonR · July 2018

@JohnMiller92 said:
Noob question, but how does CPU usage go above 100%? For example, I'm on a i7 2600 and if I use more cores+threads, I will never get past 100%

I just read Anthony's answer btw, does help a bit thanks.

Typically it'd be 100% per fully utilized core on your VPS. So running one core at max - 100%, two cores at max - 200%, etc. Which would also correspond to your load average, one core at max 1.00 load, two cores at max 2.00 load, etc (assuming little to no disk usage).

@florianb's answer might be closer to the truth, but what I said above is probably a simplification of that.

JohnMiller92 · July 2018

@MasonR said:

@JohnMiller92 said:
Noob question, but how does CPU usage go above 100%? For example, I'm on a i7 2600 and if I use more cores+threads, I will never get past 100%

I just read Anthony's answer btw, does help a bit thanks.

Typically it'd be 100% per fully utilized core on your VPS. So running one core at max - 100%, two cores at max - 200%, etc. Which would also correspond to your load average, one core at max 1.00 load, two cores at max 2.00 load, etc (assuming little to no disk usage).

@florianb's answer might be closer to the truth, but what I said above is probably a simplification of that.

I see, so it adds them up per core (all having each up to 100%), then accumulates their usages? For example, if you have a 3 vCore box, your "max" CPU usage is essentially 300%, not 100%? If I got that right

mrclown · July 2018

There might be some flaws in their system somewhere. My case was on bandwidth spike without even using after benchmark. They are still good provider but not going to fit my taste after some good conversation in series of tickets.

greattomeetyou · July 2018

MasonR said: using 240% CPU for many hours

They claimed many hours. You just have to prove otherwise? Do you happen to have logs or stats to prove it otherwise?

Howdy, Stranger!

Categories

In this Discussion

CPU Abuse Notices from VirMach

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

CPU Abuse Notices from VirMach

Comments