what do you guys use to monitor servers?

cloudvps · February 2016

Hi,

we have 200+ servers across several locations. We use newrelic/nagios to monitor them.

However we are finding it difficult to find certain business centric data such as follow

Which servers are not optimally performing?
Which servers required upgrade

Is there a solution which will show us all server stats in a single page with average data for past 15 days.

pbgben · February 2016

It seems many LowEndProviders are using the well tested CBID - Customer Based Issue Detection, It really simple and requires minimum setup. All you have to do is wait for a ticket to be created because the "Server is down".

Some providers also require a post on the LET forums as a confirmation.

alexnjh · February 2016

I use 4.

PRTG , StatusCake, Uptime Robot, Nixstats

thagoat · February 2016

@pbgben said:
It seems many LowEndProviders are using the well tested CBID - Customer Based Issue Detection, It really simple and requires minimum setup. All you have to do is wait for a ticket to be created because the "Server is down".

Some providers also require a post on the LET forums as a confirmation.

Ah, the old "The Customer Must Do All The Work" method. I can dig it.

Awmusic12635 · February 2016

nodeping + observium

Ole_Juul · February 2016

pbgben said: Some providers also require a post on the LET forums as a confirmation.

This actually adds a level of sophistication because the urgency of problem can easily be judged by the number of pages the thread runs. One page is yellow, two pages is orange, and three pages is code red.

TheZealous · February 2016

We use Pingdom and nixstats to monitor our servers.

Jonchun · February 2016

@pbgben said:
It seems many LowEndProviders are using the well tested CBID - Customer Based Issue Detection, It really simple and requires minimum setup. All you have to do is wait for a ticket to be created because the "Server is down".

Some providers also require a post on the LET forums as a confirmation.

@Nexhost

Rolter · February 2016

Uptime robot for hosts and Newrelic for servers

doughmanes · February 2016

LibreNMS for internal use, NixStats for public/customers

sin · February 2016

NixStats, NewRelic Synthetics, Uptime Doctor - all four have free 1 minute monitoring from different locations.

patrick7 · February 2016

smokeping, icinga, librenms

afterSt0rm · February 2016

Zabbix, Observium or Nagios. For non critical services, NixStats.

Sady · February 2016

Have been using Nixstats but got a HostUS box last month, playing with Nagios a lot & soon will move everything to Nagios.

Kodis · February 2016

Uptime Robot

shivoham · February 2016

cloudstats.me

vimalware · February 2016

I use @onepound 's external monitoring service (free 10 checks for clients).
I'm very impressed.

Traffic · February 2016

MikePT · February 2016

I use @vfuse nixstats and LibreNMS. Nixstats sms notifications arent working for me, though, but it does a pretty amazing job.

raindog308 · February 2016

OP is really asking for two different things.

Which servers are not optimally performing?

This is arguably more capacity planning or APM than "monitoring". There's a whole subindustry devoted to this - what does "not optimally" mean? Is it something as crude as CPU load, or something more sophisticated like "number of milliseconds for the query to return" and if so are you instrumenting at every level of the stack - server, network, app, database, etc.

Which servers required upgrade

This is perhaps more configuration management than "monitoring". Depends what you mean by "upgrade". If you mean "has not run apt-get upgrade in six months" that's one thing; upgrade because the server is not performing well/is out of warranty/is CPU model X and that's too old/etc. that's different.

Lots of people in this thread mentioned external monitoring services. For example, @NodePing is great but they're on the outside...other solutions which have an agent are needed if you want things like "is this process down".