What interval do you use for monitoring?

jbarr · November 2014

I'm using NodeQuery and Uptime Robot to keep an eye on the health and uptime of 4 VPS's and one hosted Web site.

If you use these kinds of services, what interval do you use to check the health and uptime? For commercial sites, and client sites, I can see that you would certainly want to ensure that your site is up 100%, so the check interval would be quite low (like every few minutes). in my case, my sites are hobby and test instances, so downtime isn't as critical, so I have the check interval set rather large at 30 minutes.

Do you recommend short intervals for all types of sites?

On one hand, uptime is uptime, regardless of the specific use, so a short interval seems in order. And if the host is reliable (something often failing on many Low-End hosts) then I should never see alerts, and expect very close to 100% uptime.

On the other hand, for better or for worse, I have bought into the "you get what you pay for" expectation of Low-End hosted sites. Since I am admittedly using some VERY Low-End hosts, when a site has an uptime of, say, 97%, I don't see that as a really big deal.

Thoughts?

0xdragon · November 2014

1 second

ksubedi · November 2014

For ping based checks, 1 minute. Used to use 5 minutes but the downtime numbers came wrong since the checks were only done every 5 mins (so a 2 minute downtime counted as 5).

Maounique · November 2014

1 minute for pings/ssh connection, 5 minutes load/disk/temperatures, etc.

Lm85H4gFkh3wk3 · November 2014

60 minutes for mail servers.
5 minutes for web servers.

I figure legitimate mail servers will retry for at least 60 minutes, so I can stand some blips here and there. I can't get to a PC within 5 minutes, so anyone visiting my low traffic sites can sustain a few blips < 5 minutes anyway.

SNetworks1 · November 2014

I think I use 120 seconds on everything.

Silvenga · November 2014

30 +- 5 seconds for email (Amazon's Route53 health checks), 120 +- 30 seconds for everything else.

Neoon · November 2014

30 seconds for ICMP/Port Checking.

sleddog · November 2014

5 minutes, with alert notification after 3 consecutive failures. I'm not hosting the Bank of Canada.

TheLonely · November 2014

30 days ,100% uptime each month

Serious now; 5 minutes for private/my VMs and 10-30 minutes for other sites I care a bit about.

wojons · November 2014

@0xdragon said:
1 second

I am a fan of 1 minute on everything and on demand 1-5 seconds.

NodePing · November 2014

We recommend:

1 minute ping/ssh/smtp/websocket/DNS/SIP

5 minute imap/pop/FTP

15 minute RBL/SSL

ksubedi · November 2014

Sorry this is a little off topic, but what do you guys use for monitoring?

Maounique · November 2014

NodePing, Munin, Zabbix, nfsen, observium, custom scripts, etc.

utama · November 2014

1 minutes if available if not then 5 minutes, I use Pingdom and StatusCake.

wojons · November 2014

@ksubedi said:
Sorry this is a little off topic, but what do you guys use for monitoring?

elastictrace & uptime robot

jbiloh · November 2014

We use 1 minute for our internal PRTG monitors and 5 minutes for our external monitors (pingdom). That way we get early alerts of potential issues and avoid false positives on pingdom if using shorter thresholds.

ksubedi · November 2014

@wojons said:

i have been using uptime robot and newrelic but elastictrace looks interesting

wojons · November 2014

@ksubedi said:
i have been using uptime robot and newrelic but elastictrace looks interesting

good to know been developing it for a bit let me know what you think if you decide to try it out.

geekalot · November 2014

I have been using 2 minutes internally (custom scripts) and 5 minutes externally (via 3rd party tools; that's as low as they will go) with automated failover for production sites. Using keyword based checking so this validates the entire LAMP stack.

But, I think that 90 - 180 seconds externally would be optimal; implementing that via custom scripts.

Edouard · November 2014

We use 1 minute intervals at SentinelTower

comXyz · November 2014

5 minutes if there is no problem. After first time out, the interval will be auto changed to 5 seconds.

Howdy, Stranger!

Categories

In this Discussion

What interval do you use for monitoring?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

What interval do you use for monitoring?

Comments