New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
Anyone? what do you guys use for monitoring?
PHP Server Monitor Plus
Toying with openstatus but I think I keep messing something up and the uptime field is blank. Right now I host it on one of the vps but I'm going to end up hosting it on my shared hosting server.
At risk of repeating myself from other threads, I use Zabbix. I have a dedicated server set up with it on. I plan on allowing free accounts once I finish setting up a self service portal in PHP. If anyone wants a free account, let me know. I'll have to add the hosts manually, but you'll have access to the monitoring dashboard and email alerts.
http://www.centreon.com/
works for me?
When I click on a '1 hr' or '3hr' link (etc.) all I get is a spinner.
>
https://forums.dotvps.net/
Brokenish link.
Ah, yea I did notice that. Sorry
>
I may be, if I get a MB worth of VPS every post :P
Status2k.
Just a thought... "monitoring" is different things....
You can monitor availability, e.g., port 80 on server xyz, and track its status over time (as a % uptime/downtime).
Or you can monitor performance, e.g., load, memory, swap, and track those variables over time.
Some apps do one, others do the other, I guess some do both
So when you say "monitor" you need to think about what it is you're concerned about monitoring....
http://observium.org/wiki/Main_Page for performance
pingdom on all hosts for uptime/latency
Only if you like false results
@miTgiB Lets be fair. Pingdom is extremely accurate. Until it isn't.
Haven't gotten a false result yet.. it's a free service anyways
I have zero faith in it. I don't know how many tickets I get with people that depend on pingdom and claiming I was down when there was no issue. Occasionally pingdom gets lucky with a correct down report, but they are rare.
External Monitoring: BinaryCanary (waiting for our new server for a custom monitoring script) and scrd+status+munin (results).
Internal Monitoring: scrd+status+munin (results), custom monitoring script, and Observium.
I use cacti along with nagios for some servers.
maybe that's the problem
Demo of my availability monitor: http://199.96.82.38/pung/
+1, thats some nice work.
comment Inception.
Anywyas yeah dude that looks awesome.... Are you going to open source that?
Yes, OSS, WIR I originally write it ~6 years ago for internal use. I thought I'd clean it up a bit for public use, but the "clean up" turned into a rewrite.
Meanwhile, guess who monitors for tcp connections with no data, and eventually blocks for an hour? I'm guessing WHT comes up again around 15:14 NDT
We are finding this increasingly, customers just assume that when pingdom says its down that its actually down. When in reality there site is still online.
We are using BinaryCanery too, finding it much more reliable than Pingdom! Then we have our internal monitoring system, teamed with Cacti and Nagios.
Still not one false report from uptime robot in a year
I re-started my demo/test monitor, targeting providers' test IPs from the Offers forum. I'm interested in reducing/eliminating false positives, and I thought this might help -- with confirmation/denial from the provider regarding any downtime
http://199.96.82.38/pung/
Maybe you should add distributed monitoring for that
No. it would add little or no benefit, and be significantly more complex, which in turn creates a greater probability of errors.
I see one of our IPs there, thank you @sleddog
Hows that?
People generally look at scripts like this as an "uptime" monitor. It isn't, and I don't It's a point-to-point connection monitor. It tries to establish a tcp/ip connection across the Internet from Point A to multiple targets (Points on designated ports, and records the results.
In my script, an attempted connection can have one of three possible results:
The meaning of "Connection succeeded" is obvious. "Connection refused" means that the target was reached but it refused the connection. This could be because a listening service has stopped (e.g., apache has crashed) or a firewall has rejected the connection.
"Connection timed out" means that the target offered no response. This is the most problematic. It could be because:
A. The target -- Point B -- is offline, or:
B. There is a networking issue somewhere on the route from Point A to Point B.
Obviously there's a third possibility:
C. There's a localized issue causing Point A to have lost Internet access.
But the script checks for that, logs it if it exists and exits quietly.
Adding one or more monitoring stations ("distributed" monitoring) might help clarify if the issue is B above, but only if the additional stations take different routes to Point B, and the issue doesn't lie along those alternate routes. But frankly this is something I'd prefer to investigate manually. Frequently-repeated or extended timed-outs say, "look into it but don't make assumptions"
Again, it's not an "uptime" monitor. It's a point-to-point connection monitor with history. A red dot doesn't neccesarily mean "OMG It's Down!" It means there was an issue establishing a connection between two points.
Of course monitoring won't happen (and may not be available for viewing via www) if the monitoring station (Point A) is down. That's why I put it either on a remote LEB with respectable uptime / network uptime (say at least 99%), or on a local box that I manage.