New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Nagios server high load
I've got a weird issue, I have a Nagios installation with Centreon installed on a VPS, it monitors some 330 servers and nearly 2800 services.
The VPS has 4 cores and 1GB RAM.
Lately the load on the VPS has been hitting 5's, 7's and higher. Does anyone know any tricks to optimize Nagios?
I already tried in the tuning section to modify these settings but they haven't made any difference.
Maximum Service Check Spread 10mins
Maximum Concurrent Service Checks 40
Use large installation tweaks Yes
Some services are checked every minute, others every 15mins while others are checked every 4 hours.
Comments
For the amount of hosts / services you have, this is a fairly low amount of ram. I would reccomend an additional 3G. One optimization which may help in your case is a ramdisk, but with that said -- you'll need more ram.
Your amount of cores should be fine for now. What is eating at your resources? It may be one specific check that's causing the issue.
I'm not really sure, how can I find out if its a specific check causing the issue? I'll get the RAM upgraded though asap.
Try getting 2gb ram, is this a openvz or kvm vps?
Its OpenVZ
Could be someone else abusing the node, did you try contacting your provider?
If you're going to get downtime from the RAM upgrade I suggest shutting down Nagios first and checking the load, like @Layer03 said, it may be a noisy neighbour.
@Layer03 I strongly disagree - there are many other factors when it comes to Nagios. I don't know why that would be your first assumption.
@zafouhar - Run top | head - 25 and post it back here. Run a ps -ef while you're at it as well. This will give me an idea of what kind of checks you're running.
Actually the provider contacted me first provider is Ramnode
I'll run those @Riz and report back
Here they are @Riz - its still weird though as I still fail to understand where the load is coming from.
Nothing standing out at a quick glance, can you also run ps -eo pcpu,args --sort=-%cpu|head -n 20 - this will tell us the top 20 processes.
This may be a bit different on Centreon as I'm more familiar with Nagios, but do you have a page under System called 'Performance Info'? Can you post a screenshot of this page?