Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


What interval do you use for monitoring?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

What interval do you use for monitoring?

I'm using NodeQuery and Uptime Robot to keep an eye on the health and uptime of 4 VPS's and one hosted Web site.

If you use these kinds of services, what interval do you use to check the health and uptime? For commercial sites, and client sites, I can see that you would certainly want to ensure that your site is up 100%, so the check interval would be quite low (like every few minutes). in my case, my sites are hobby and test instances, so downtime isn't as critical, so I have the check interval set rather large at 30 minutes.

Do you recommend short intervals for all types of sites?

On one hand, uptime is uptime, regardless of the specific use, so a short interval seems in order. And if the host is reliable (something often failing on many Low-End hosts) then I should never see alerts, and expect very close to 100% uptime.

On the other hand, for better or for worse, I have bought into the "you get what you pay for" expectation of Low-End hosted sites. Since I am admittedly using some VERY Low-End hosts, when a site has an uptime of, say, 97%, I don't see that as a really big deal.

Thoughts?

Comments

  • 1 second

  • For ping based checks, 1 minute. Used to use 5 minutes but the downtime numbers came wrong since the checks were only done every 5 mins (so a 2 minute downtime counted as 5).

  • MaouniqueMaounique Host Rep, Veteran

    1 minute for pings/ssh connection, 5 minutes load/disk/temperatures, etc.

  • edited November 2014

    60 minutes for mail servers.
    5 minutes for web servers.

    I figure legitimate mail servers will retry for at least 60 minutes, so I can stand some blips here and there. I can't get to a PC within 5 minutes, so anyone visiting my low traffic sites can sustain a few blips < 5 minutes anyway.

  • I think I use 120 seconds on everything.

  • 30 +- 5 seconds for email (Amazon's Route53 health checks), 120 +- 30 seconds for everything else.

  • NeoonNeoon Community Contributor, Veteran

    30 seconds for ICMP/Port Checking.

  • 5 minutes, with alert notification after 3 consecutive failures. I'm not hosting the Bank of Canada.

  • 30 days ,100% uptime each month :p

    Serious now; 5 minutes for private/my VMs and 10-30 minutes for other sites I care a bit about.

  • @0xdragon said:
    1 second

    I am a fan of 1 minute on everything and on demand 1-5 seconds.

  • We recommend:

    1 minute ping/ssh/smtp/websocket/DNS/SIP

    5 minute imap/pop/FTP

    15 minute RBL/SSL

  • Sorry this is a little off topic, but what do you guys use for monitoring?

  • MaouniqueMaounique Host Rep, Veteran

    NodePing, Munin, Zabbix, nfsen, observium, custom scripts, etc.

    Thanked by 2ksubedi NodePing
  • 1 minutes if available if not then 5 minutes, I use Pingdom and StatusCake.

  • @ksubedi said:
    Sorry this is a little off topic, but what do you guys use for monitoring?

    elastictrace & uptime robot

  • jbilohjbiloh Administrator, Veteran

    We use 1 minute for our internal PRTG monitors and 5 minutes for our external monitors (pingdom). That way we get early alerts of potential issues and avoid false positives on pingdom if using shorter thresholds.

  • @wojons said:

    i have been using uptime robot and newrelic but elastictrace looks interesting

  • @ksubedi said:
    i have been using uptime robot and newrelic but elastictrace looks interesting

    good to know been developing it for a bit let me know what you think if you decide to try it out.

  • geekalotgeekalot Member
    edited November 2014

    I have been using 2 minutes internally (custom scripts) and 5 minutes externally (via 3rd party tools; that's as low as they will go) with automated failover for production sites. Using keyword based checking so this validates the entire LAMP stack.

    But, I think that 90 - 180 seconds externally would be optimal; implementing that via custom scripts.

  • We use 1 minute intervals at SentinelTower

  • 5 minutes if there is no problem. After first time out, the interval will be auto changed to 5 seconds.

Sign In or Register to comment.