Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Nagios server high load
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Nagios server high load

zafouharzafouhar Veteran
edited March 2016 in General

I've got a weird issue, I have a Nagios installation with Centreon installed on a VPS, it monitors some 330 servers and nearly 2800 services.

The VPS has 4 cores and 1GB RAM.

Lately the load on the VPS has been hitting 5's, 7's and higher. Does anyone know any tricks to optimize Nagios?

I already tried in the tuning section to modify these settings but they haven't made any difference.

Maximum Service Check Spread 10mins

Maximum Concurrent Service Checks 40

Use large installation tweaks Yes

Some services are checked every minute, others every 15mins while others are checked every 4 hours.

Comments

  • RizRiz Member
    edited March 2016

    For the amount of hosts / services you have, this is a fairly low amount of ram. I would reccomend an additional 3G. One optimization which may help in your case is a ramdisk, but with that said -- you'll need more ram.

    Your amount of cores should be fine for now. What is eating at your resources? It may be one specific check that's causing the issue.

  • @Riz said:

    Your amount of cores should be fine for now. What is eating at your resources? It may be one specific check that's causing the issue.

    I'm not really sure, how can I find out if its a specific check causing the issue? I'll get the RAM upgraded though asap.

  • Layer03Layer03 Member, Host Rep
    edited March 2016

    Try getting 2gb ram, is this a openvz or kvm vps?

  • @Layer03 said:
    ``Try getting 2gb ram, is this a openvz or kvm vps?

    Its OpenVZ

  • Layer03Layer03 Member, Host Rep

    Could be someone else abusing the node, did you try contacting your provider?

  • ATHKATHK Member

    If you're going to get downtime from the RAM upgrade I suggest shutting down Nagios first and checking the load, like @Layer03 said, it may be a noisy neighbour.

  • RizRiz Member

    @Layer03 I strongly disagree - there are many other factors when it comes to Nagios. I don't know why that would be your first assumption.

    @zafouhar - Run top | head - 25 and post it back here. Run a ps -ef while you're at it as well. This will give me an idea of what kind of checks you're running.

  • @Layer03 said:
    Could be someone else abusing the node, did you try contacting your provider?

    Actually the provider contacted me first :p provider is Ramnode

    I'll run those @Riz and report back

  • zafouharzafouhar Veteran
    edited March 2016

    Here they are @Riz - its still weird though as I still fail to understand where the load is coming from.

    [root@vps2 ~]# top | head -25
    top - 04:14:44 up 17:42,  1 user,  load average: 3.41, 2.59, 2.03
    Tasks:  42 total,   1 running,  41 sleeping,   0 stopped,   0 zombie
    Cpu(s):  2.5%us,  1.6%sy,  0.0%ni, 95.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Mem:   2097152k total,   580560k used,  1516592k free,        0k buffers
    Swap:   524288k total,        0k used,   524288k free,   226676k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
      901 nagios    20   0 49796  22m  628 S  2.0  1.1  14:23.14 nagios
        1 root      20   0 10372  168   40 S  0.0  0.0   0:00.29 init
        2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd/47786
        3 root      20   0     0    0    0 S  0.0  0.0   0:00.00 khelper/47786
      114 root      16  -4 12640  328    0 S  0.0  0.0   0:00.00 udevd
      559 root      20   0  5932  276  152 S  0.0  0.0   0:01.10 syslogd
      579 root      20   0 55992 9040  144 S  0.0  0.4   0:00.59 snmptt
      580 root      20   0 55992 9184  280 S  0.0  0.4   0:02.27 snmptt
      589 root      20   0  101m 5228 1036 S  0.0  0.2   0:21.21 snmpd
      601 root      20   0 64816  588    0 S  0.0  0.0   0:00.00 sshd
      609 root      20   0 21664  220    4 S  0.0  0.0   0:00.00 xinetd
      656 root      20   0 10788  232    4 S  0.0  0.0   0:00.00 mysqld_safe
      738 mysql     20   0  267m  53m 2624 S  0.0  2.6  15:59.61 mysqld
      771 nagios    20   0 66552 5536  864 S  0.0  0.3   0:02.74 centcore
      806 root      20   0 62856 1628  236 S  0.0  0.1   0:00.76 sendmail
      814 smmsp     20   0 57736 1160    4 S  0.0  0.1   0:00.00 sendmail
      824 root      20   0  258m 5540  136 S  0.0  0.3   0:01.17 httpd
      832 root      20   0 19724  704  136 S  0.0  0.0   0:00.40 crond
    
    [root@vps2 ~]# ps -ef
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 Mar06 ?        00:00:00 init [3]
    root         2     1  0 Mar06 ?        00:00:00 [kthreadd/47786]
    root         3     2  0 Mar06 ?        00:00:00 [khelper/47786]
    root       114     1  0 Mar06 ?        00:00:00 /sbin/udevd -d
    root       389  6011  0 04:14 pts/0    00:00:00 ps -ef
    root       559     1  0 Mar06 ?        00:00:01 syslogd -m 0
    root       579     1  0 Mar06 ?        00:00:00 /usr/bin/perl /usr/share/centreon/bin/snmptt --daemon --ini=/etc/snmp/centreon_traps/snmptt.ini
    root       580   579  0 Mar06 ?        00:00:02 /usr/bin/perl /usr/share/centreon/bin/snmptt --daemon --ini=/etc/snmp/centreon_traps/snmptt.ini
    root       589     1  0 Mar06 ?        00:00:21 /usr/sbin/snmpd -Lsd -Lf /dev/null -p /var/run/snmpd.pid -a
    root       601     1  0 Mar06 ?        00:00:00 /usr/sbin/sshd
    root       609     1  0 Mar06 ?        00:00:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
    root       656     1  0 Mar06 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/
    mysql      738   656  1 Mar06 ?        00:15:59 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pi
    nagios     771     1  0 Mar06 ?        00:00:02 /usr/bin/perl -w /usr/share/centreon/bin/centcore
    root       806     1  0 Mar06 ?        00:00:00 sendmail: accepting connections
    smmsp      814     1  0 Mar06 ?        00:00:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
    root       824     1  0 Mar06 ?        00:00:01 /usr/sbin/httpd
    root       832     1  0 Mar06 ?        00:00:00 crond
    xfs        852     1  0 Mar06 ?        00:00:00 xfs -droppriv -daemon
    apache     854   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
    apache     855   824  0 Mar06 ?        00:00:05 /usr/sbin/httpd
    apache     856   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
    apache     857   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
    apache     858   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
    apache     859   824  0 Mar06 ?        00:00:08 /usr/sbin/httpd
    apache     861   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
    apache     862   824  0 Mar06 ?        00:00:08 /usr/sbin/httpd
    root       868     1  0 Mar06 ?        00:00:00 /usr/sbin/saslauthd -m /var/run/saslauthd -a pam -n 2
    root       870   868  0 Mar06 ?        00:00:00 /usr/sbin/saslauthd -m /var/run/saslauthd -a pam -n 2
    nagios     901     1  1 Mar06 ?        00:14:23 /usr/bin/nagios -d /etc/nagios/nagios.cfg
    root       905     1  0 Mar06 tty1     00:00:00 /sbin/mingetty console
    root       906     1  0 Mar06 tty2     00:00:00 /sbin/mingetty tty2
    apache    1068   824  0 Mar06 ?        00:00:06 /usr/sbin/httpd
    apache    1075   824  0 Mar06 ?        00:00:10 /usr/sbin/httpd
    apache    1076   824  0 Mar06 ?        00:00:08 /usr/sbin/httpd
    apache    1163   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
    apache    1189   824  0 Mar06 ?        00:00:07 /usr/sbin/httpd
    apache    1190   824  0 Mar06 ?        00:00:06 /usr/sbin/httpd
    root      5443   601  0 Mar06 ?        00:00:04 sshd: root@pts/0
    root      6011  5443  0 Mar06 pts/0    00:00:00 -bash
    nagios   16135     1  0 02:00 ?        00:00:00 /usr/bin/perl -w /usr/share/centreon/bin/centstorage
    [root@vps2 ~]#
    
  • RizRiz Member

    Nothing standing out at a quick glance, can you also run ps -eo pcpu,args --sort=-%cpu|head -n 20 - this will tell us the top 20 processes.

    This may be a bit different on Centreon as I'm more familiar with Nagios, but do you have a page under System called 'Performance Info'? Can you post a screenshot of this page?

Sign In or Register to comment.