All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
What interval do you use for monitoring?
I'm using NodeQuery and Uptime Robot to keep an eye on the health and uptime of 4 VPS's and one hosted Web site.
If you use these kinds of services, what interval do you use to check the health and uptime? For commercial sites, and client sites, I can see that you would certainly want to ensure that your site is up 100%, so the check interval would be quite low (like every few minutes). in my case, my sites are hobby and test instances, so downtime isn't as critical, so I have the check interval set rather large at 30 minutes.
Do you recommend short intervals for all types of sites?
On one hand, uptime is uptime, regardless of the specific use, so a short interval seems in order. And if the host is reliable (something often failing on many Low-End hosts) then I should never see alerts, and expect very close to 100% uptime.
On the other hand, for better or for worse, I have bought into the "you get what you pay for" expectation of Low-End hosted sites. Since I am admittedly using some VERY Low-End hosts, when a site has an uptime of, say, 97%, I don't see that as a really big deal.
Thoughts?
Comments
1 second
For ping based checks, 1 minute. Used to use 5 minutes but the downtime numbers came wrong since the checks were only done every 5 mins (so a 2 minute downtime counted as 5).
1 minute for pings/ssh connection, 5 minutes load/disk/temperatures, etc.
60 minutes for mail servers.
5 minutes for web servers.
I figure legitimate mail servers will retry for at least 60 minutes, so I can stand some blips here and there. I can't get to a PC within 5 minutes, so anyone visiting my low traffic sites can sustain a few blips < 5 minutes anyway.
I think I use 120 seconds on everything.
30 +- 5 seconds for email (Amazon's Route53 health checks), 120 +- 30 seconds for everything else.
30 seconds for ICMP/Port Checking.
5 minutes, with alert notification after 3 consecutive failures. I'm not hosting the Bank of Canada.
30 days ,100% uptime each month
Serious now; 5 minutes for private/my VMs and 10-30 minutes for other sites I care a bit about.
I am a fan of 1 minute on everything and on demand 1-5 seconds.
We recommend:
1 minute ping/ssh/smtp/websocket/DNS/SIP
5 minute imap/pop/FTP
15 minute RBL/SSL
Sorry this is a little off topic, but what do you guys use for monitoring?
NodePing, Munin, Zabbix, nfsen, observium, custom scripts, etc.
1 minutes if available if not then 5 minutes, I use Pingdom and StatusCake.
elastictrace & uptime robot
We use 1 minute for our internal PRTG monitors and 5 minutes for our external monitors (pingdom). That way we get early alerts of potential issues and avoid false positives on pingdom if using shorter thresholds.
i have been using uptime robot and newrelic but elastictrace looks interesting
good to know been developing it for a bit let me know what you think if you decide to try it out.
I have been using 2 minutes internally (custom scripts) and 5 minutes externally (via 3rd party tools; that's as low as they will go) with automated failover for production sites. Using keyword based checking so this validates the entire LAMP stack.
But, I think that 90 - 180 seconds externally would be optimal; implementing that via custom scripts.
We use 1 minute intervals at SentinelTower
5 minutes if there is no problem. After first time out, the interval will be auto changed to 5 seconds.