All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
CPU Abuse Notices from VirMach
Hey all,
I'll preface this post by saying that VirMach is a terrific provider and this thread is not meant to throw shade at them, just is intended to spark a healthy discussion on a possibly faulty system.
tl;dr: got a CPU abuse message, but VPS is mostly idle. Support insisted there is no chance their abuse detection system is wrong. Anyone else have same experience?
Long version:
I posted over on HostBalls about receiving a CPU abuse notice from VirMach and believing it to be a mistake on their part. To my surprise, I got quite a few replies from others saying they've gotten similar notices erroneously.
Here's a little background - this morning I received an automated email that one of my VPSes was using 240% CPU for many hours and to reduce CPU usage or I will have my service temporarily turned off. I monitor my array of VMs regularly (via live server stats) and was surprised to get this message. I immediately logged into the VPS that I was notified about and found the usage/load -
Low CPU usage and a 0.00 load avg as expected. I checked the access logs and there wasn't any suspicious logins. Root login is disabled and fail2ban is also installed. Really the only thing this VPS runs is a small TeamSpeak server, otherwise is idle as seen in the screencap above.
I replied to the ticket that there must be a mistake and that my VPS is barely using any resources. Level 3 support replies that, "this is a system generated message so it won't be wrong," then instructs me how to use the task manager to monitor CPU usage. I reply that I'm using Linux and attach the screenshot above, then get instructions on how to use top/htop to monitor CPU usage (even though the screenshot was a htop cap).
Side note - how does one reach 240% CPU usage with only 2 vCPUs?
How many of you guys have encountered similar issues with VirMach or other providers? Anyone have their VPS temporarily suspended even though they weren't using lots of resources?
Comments
well , maybe you hit 100% for over 5 minutes , they have VERY stricted CPU policy.
I don't have virmach , but I was hit CPU limits on other provider , where I was compiling something that took hours , so after all my VPS CPU was limits to 400 MHz or something , luckily I didn't get suspended.
I have received a notice for 114.8% CPU usage for multiple hours on a single vCPU VPS.
The message states that I was using "239.6% CPU for multiple hours." There a 0% chance that a small TeamSpeak server was using that kind of CPU at 6am.
“Yes sir you were using 2 cores on a one core VPS”
Had 1 VPS suspended for the same 'CPU abuse' reason - completely idle box. I have a feeling their monitoring is using system load, not CPU usage.
So if someone else hammers the I/O or whatever and it causes the load to spike in your VM - they'll auto suspend/send a notice.
Have you asked for logs of this high cpu usage?
Also, why not spin up an instance of librenms, observium, or something else. That way you have logs you can counter with. This is a huge reason why I monitor all my vms with Librenms. I know if something is amiss, not working, or running weirdly.
I have not. I doubt they have them since their system is always right, though :P
Control panel only has graphs for network traffic and i/o usage.
We are never wrong!
Ask them for logs, to prove it to you. Tell them you found zero things that would cause that high load for hours. Say you need extra info to help figure out what is going on.
Same here with a SSD1G
First a "warning" on last Sunday
Today a shutdown, again (coincident?)
I do monitor all my boxes, high load trigger at 5% over 1/5/15mins and i have absolutely no idea what they are on about, especially not with the "multiple hours" part.
Either their anti-abuse system is flawed or they have oversold their stuff and try to get rid of people.
Whatever it is, its getting quite annoying, especially since the box has been up for half a year (setup and forget) and never gave me any problems.
More than likely it is a misconfigured server that is stuck on something.
Wait Wait Wait! How long is your /var/log/auth.log?
fail2ban searches through it every now and then to verify, add, and remove entries.
https://github.com/fail2ban/fail2ban/issues/1339
Couple Megs. Hmm. Auth and fail2ban logs look normal around the time I got the message, though. And I've never experience this issue (cpu abuse notice) on any other VPS I have including LES machines, all running fail2ban. I am running Debian 8 (jessie) if that matters.
computer is never wrong. That why I would use self-driving car. I would believe when my domestic robot said that I was wrong.
The thing is if you auth.log is long it has to search through ever entry during certain times. It has pegged my CPU in the past.
Flawed system. Exact reason I manually check things before suspending KVM.
My guess.
They are getting the info from top themselves which is completely unreliable for monitoring actual guest usage and one of the following is true:
1) You dont have virtio disk
2) You dont have virtio net adapter
3) Your CPU is QEMU-CPU
Which will result in them seeing the emulation overhead on your VPS from TOP which is not your VPS using it it is their host node using it against your qemu process.
How do you get 240% from a 2 vCPU system you ask? you have a broken monitoring system with bad logic that is how.
not sure if they have a rep here, but if they do they should seriously look into forcing cpu-passthrough and exact match along with virtio drivers when possible and monitor through something like atopsar or virt-top with scripted aggregation for actual use
And nobody tagged him still.
@virmach
I've only had it for a short period at boot/after restarting fail2ban. With a gigantic auth.log you're still only talking minutes not hours.
No PasswordAuthentication, no fail2ban, no exposed services.
Just got this from a "Tier 3 Technical Support Agent" after responding to the Shutdown ticket with another monitoring graph showing 99% avg idle over three days...
When using VPS, how do you aggregate stats for your analysis purposes?
From the provider side or the client side?
I use https://github.com/BotoX/ServerStatus to get live server stats and plop it all on a status page. But you can also use HetrixTools or LibreNMS or something similar if you want to collect and store the usage history.
Don't buy cheap and use cheap.
@MasonR CPU usage is usually calculated based on historical data from cpu time, this is divided and factored based on the amount of virtual machine CPUs, hostnode CPUs and their respective times. If your VM was to use massive amounts of CPU (i.e. a stress test) you can easily make those calculations return values of around 115% for a single core VM. A core in virtualisation doesn't literally mean it can only use 100%, that's simply sort of what's to be expected by the cpu time split based on the VMs cores.
Noob question, but how does CPU usage go above 100%? For example, I'm on a i7 2600 and if I use more cores+threads, I will never get past 100%
edit: I just saw Anthony's answer btw, does help a bit thanks.
Typically it'd be 100% per fully utilized core on your VPS. So running one core at max - 100%, two cores at max - 200%, etc. Which would also correspond to your load average, one core at max 1.00 load, two cores at max 2.00 load, etc (assuming little to no disk usage).
@florianb's answer might be closer to the truth, but what I said above is probably a simplification of that.
I see, so it adds them up per core (all having each up to 100%), then accumulates their usages? For example, if you have a 3 vCore box, your "max" CPU usage is essentially 300%, not 100%? If I got that right
There might be some flaws in their system somewhere. My case was on bandwidth spike without even using after benchmark. They are still good provider but not going to fit my taste after some good conversation in series of tickets.
They claimed many hours. You just have to prove otherwise? Do you happen to have logs or stats to prove it otherwise?