Nagios server high load

zafouhar · March 2016

I've got a weird issue, I have a Nagios installation with Centreon installed on a VPS, it monitors some 330 servers and nearly 2800 services.

The VPS has 4 cores and 1GB RAM.

Lately the load on the VPS has been hitting 5's, 7's and higher. Does anyone know any tricks to optimize Nagios?

I already tried in the tuning section to modify these settings but they haven't made any difference.

Maximum Service Check Spread 10mins

Maximum Concurrent Service Checks 40

Use large installation tweaks Yes

Some services are checked every minute, others every 15mins while others are checked every 4 hours.

Riz · March 2016

For the amount of hosts / services you have, this is a fairly low amount of ram. I would reccomend an additional 3G. One optimization which may help in your case is a ramdisk, but with that said -- you'll need more ram.

Your amount of cores should be fine for now. What is eating at your resources? It may be one specific check that's causing the issue.

zafouhar · March 2016

@Riz said:

Your amount of cores should be fine for now. What is eating at your resources? It may be one specific check that's causing the issue.

I'm not really sure, how can I find out if its a specific check causing the issue? I'll get the RAM upgraded though asap.

Layer03 · March 2016

Try getting 2gb ram, is this a openvz or kvm vps?

zafouhar · March 2016

@Layer03 said:
``Try getting 2gb ram, is this a openvz or kvm vps?

Its OpenVZ

Layer03 · March 2016

Could be someone else abusing the node, did you try contacting your provider?

ATHK · March 2016

If you're going to get downtime from the RAM upgrade I suggest shutting down Nagios first and checking the load, like @Layer03 said, it may be a noisy neighbour.

Riz · March 2016

@Layer03 I strongly disagree - there are many other factors when it comes to Nagios. I don't know why that would be your first assumption.

@zafouhar - Run top | head - 25 and post it back here. Run a ps -ef while you're at it as well. This will give me an idea of what kind of checks you're running.

zafouhar · March 2016

@Layer03 said:
Could be someone else abusing the node, did you try contacting your provider?

Actually the provider contacted me first provider is Ramnode

I'll run those @Riz and report back

zafouhar · March 2016

Here they are @Riz - its still weird though as I still fail to understand where the load is coming from.

[root@vps2 ~]# top | head -25
top - 04:14:44 up 17:42,  1 user,  load average: 3.41, 2.59, 2.03
Tasks:  42 total,   1 running,  41 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.5%us,  1.6%sy,  0.0%ni, 95.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2097152k total,   580560k used,  1516592k free,        0k buffers
Swap:   524288k total,        0k used,   524288k free,   226676k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  901 nagios    20   0 49796  22m  628 S  2.0  1.1  14:23.14 nagios
    1 root      20   0 10372  168   40 S  0.0  0.0   0:00.29 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd/47786
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.00 khelper/47786
  114 root      16  -4 12640  328    0 S  0.0  0.0   0:00.00 udevd
  559 root      20   0  5932  276  152 S  0.0  0.0   0:01.10 syslogd
  579 root      20   0 55992 9040  144 S  0.0  0.4   0:00.59 snmptt
  580 root      20   0 55992 9184  280 S  0.0  0.4   0:02.27 snmptt
  589 root      20   0  101m 5228 1036 S  0.0  0.2   0:21.21 snmpd
  601 root      20   0 64816  588    0 S  0.0  0.0   0:00.00 sshd
  609 root      20   0 21664  220    4 S  0.0  0.0   0:00.00 xinetd
  656 root      20   0 10788  232    4 S  0.0  0.0   0:00.00 mysqld_safe
  738 mysql     20   0  267m  53m 2624 S  0.0  2.6  15:59.61 mysqld
  771 nagios    20   0 66552 5536  864 S  0.0  0.3   0:02.74 centcore
  806 root      20   0 62856 1628  236 S  0.0  0.1   0:00.76 sendmail
  814 smmsp     20   0 57736 1160    4 S  0.0  0.1   0:00.00 sendmail
  824 root      20   0  258m 5540  136 S  0.0  0.3   0:01.17 httpd
  832 root      20   0 19724  704  136 S  0.0  0.0   0:00.40 crond

[root@vps2 ~]# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Mar06 ?        00:00:00 init [3]
root         2     1  0 Mar06 ?        00:00:00 [kthreadd/47786]
root         3     2  0 Mar06 ?        00:00:00 [khelper/47786]
root       114     1  0 Mar06 ?        00:00:00 /sbin/udevd -d
root       389  6011  0 04:14 pts/0    00:00:00 ps -ef
root       559     1  0 Mar06 ?        00:00:01 syslogd -m 0
root       579     1  0 Mar06 ?        00:00:00 /usr/bin/perl /usr/share/centreon/bin/snmptt --daemon --ini=/etc/snmp/centreon_traps/snmptt.ini
root       580   579  0 Mar06 ?        00:00:02 /usr/bin/perl /usr/share/centreon/bin/snmptt --daemon --ini=/etc/snmp/centreon_traps/snmptt.ini
root       589     1  0 Mar06 ?        00:00:21 /usr/sbin/snmpd -Lsd -Lf /dev/null -p /var/run/snmpd.pid -a
root       601     1  0 Mar06 ?        00:00:00 /usr/sbin/sshd
root       609     1  0 Mar06 ?        00:00:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
root       656     1  0 Mar06 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/
mysql      738   656  1 Mar06 ?        00:15:59 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pi
nagios     771     1  0 Mar06 ?        00:00:02 /usr/bin/perl -w /usr/share/centreon/bin/centcore
root       806     1  0 Mar06 ?        00:00:00 sendmail: accepting connections
smmsp      814     1  0 Mar06 ?        00:00:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
root       824     1  0 Mar06 ?        00:00:01 /usr/sbin/httpd
root       832     1  0 Mar06 ?        00:00:00 crond
xfs        852     1  0 Mar06 ?        00:00:00 xfs -droppriv -daemon
apache     854   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
apache     855   824  0 Mar06 ?        00:00:05 /usr/sbin/httpd
apache     856   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
apache     857   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
apache     858   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
apache     859   824  0 Mar06 ?        00:00:08 /usr/sbin/httpd
apache     861   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
apache     862   824  0 Mar06 ?        00:00:08 /usr/sbin/httpd
root       868     1  0 Mar06 ?        00:00:00 /usr/sbin/saslauthd -m /var/run/saslauthd -a pam -n 2
root       870   868  0 Mar06 ?        00:00:00 /usr/sbin/saslauthd -m /var/run/saslauthd -a pam -n 2
nagios     901     1  1 Mar06 ?        00:14:23 /usr/bin/nagios -d /etc/nagios/nagios.cfg
root       905     1  0 Mar06 tty1     00:00:00 /sbin/mingetty console
root       906     1  0 Mar06 tty2     00:00:00 /sbin/mingetty tty2
apache    1068   824  0 Mar06 ?        00:00:06 /usr/sbin/httpd
apache    1075   824  0 Mar06 ?        00:00:10 /usr/sbin/httpd
apache    1076   824  0 Mar06 ?        00:00:08 /usr/sbin/httpd
apache    1163   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
apache    1189   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
apache    1190   824  0 Mar06 ?        00:00:06 /usr/sbin/httpd
root      5443   601  0 Mar06 ?        00:00:04 sshd: root@pts/0
root      6011  5443  0 Mar06 pts/0    00:00:00 -bash
nagios   16135     1  0 02:00 ?        00:00:00 /usr/bin/perl -w /usr/share/centreon/bin/centstorage
[root@vps2 ~]#

Riz · March 2016

Nothing standing out at a quick glance, can you also run ps -eo pcpu,args --sort=-%cpu|head -n 20 - this will tell us the top 20 processes.

This may be a bit different on Centreon as I'm more familiar with Nagios, but do you have a page under System called 'Performance Info'? Can you post a screenshot of this page?

Howdy, Stranger!

Categories

In this Discussion

Nagios server high load

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Nagios server high load

Comments