@jcaleb said: @prometeus said: I know that this might be difficult to code but it would be fantastic if you could add the ability to talk between two (or more ) instance of your script as part of the check sequence. I.e. the instance in chicago talk with another instance in london or los angeles to have confirmation of a down or severe packet loss...
One way I am thinking is, if its SQL-LITE based, then just git commit the db file at some interval. then a master one just pull the sql lite from diff location and consolidate.
It's important I think to take the results of the pung monitor quite literally. At the moment, connections to the CVPS Buffalo target are timing out. That doesn't mean CVPS Buffalo is "down". It might be, or there might be a network issues between the Chicago monitor & CVPS Buffalo. A second (or third) monitoring station would answer that only if it took a completely different route to the CVPS Buffalo target (triangulation). But it's impossible to control routing for something like this.
So my preference to keep the monitoring as it is -- a single, simple point-to-point test -- and then investigate manually when an issue crops up.
@sleddog said: So my preference to keep the monitoring as it is -- a single, simple point-to-point test -- and then investigate manually when an issue crops up.
I understand, but a quorum between monitoring nodes would be a valuable plus
@prometeus said: I understand, but a quorum between monitoring nodes would be a valuable plus
how about a main website that pulls the info from different pung slaves, then display in different tabs per location. i.e. it doesnt consolidate per ping. just show 1 tab for the node in canada. 1 tab for node in LA. etc
@sleddog said: Can you explain the benefit it would add?
Say a cable like SEA-ME-WE4 breaks, this cable works in sections. We know Bangladesh only has access to this cable(they have backup satellite links) how ever if your in Bangladesh your opinion is the internet is failing you, as where as if you have a slave in (say India) you know that not true. (terrible example)
When you talking about IP traffic its vital you have Src/Dst and 3rd monitor to verify Src/Dst isn't broken.
To me a false positive means that a test is reported as "OK" when it's not. The only way I can see that happening is by a bug in the pung app.
Maybe you mean false negative -- a test is reported as "timed out" when it's actually OK. To me, timed out means timed out -- remember the test is actually two attempts over a ~15 sec timespan (with a successful connection to google or twitter or facebook in between). So something went wrong. Still, I tend to discount single, isolated failures as insignificant / unimportant.
@sleddog said: That doesn't mean CVPS Buffalo is "down".
With IPXcore reporting up and being in the same racks, at most the machine you are testing for in Buffalo could have an issue, but ChicagoVPS Buffalo is defiantly up.
@miTgiB said: With IPXcore reporting up and being in the same racks, at most the machine you are testing for in Buffalo could have an issue, but ChicagoVPS Buffalo is defiantly up.
By "CVPS Buffalo" I meant the CVPS Buffalo target (IP:port), not all services provided by CVPS in Buffalo.
Yes, I'm looking this from the side of the allert point of view. I would like a "Huston, we got a problem" message only when double checked/confirmed. But don't want to insist, your point to keep it simple is valid :P
This is when you know pingdom is not always correct. It spotted a 5 minute downtime (Monitored every minute) last night, yet pung has reported no problems.
BTW, on the pung page at bottom you can now set your local time offset (from UTC). "Last run" time at top will then be within 5 minutes of your local time, and the log will be in your local time.
I changed the log time format to a unix timestamp for easier manipulation, that's why there's a new log.
It surprises me how fast the page loads with so many checks, our status page can take forever sometimes, but thats probably because we have a PHP check for each service on the same page :P
@GetKVM_Ash said: It surprises me how fast the page loads with so many checks
Checks are done independently with a bash script. The PHP page merely reads a couple text files and formats the display. If you happen to load the PHP page while the bash script is in progress you'll see 'Running...' at top in place of 'Last Run'.
@exextatic said: Messaged you about having two of our Xen VPS nodes added to the list (in France and Germany).
Thanks exextatic. The list is comprised of hosts that have made offers here (or on LEB) and I'd like to keep that informal rule... maybe you'll consider making an offer?
Comments
I'd argue that this kind of approach introduces more complexity, and more complexity brings more potential for error. I wrote about it in the other monitoring thread: http://www.lowendtalk.com/discussion/comment/90513#Comment_90513
It's important I think to take the results of the pung monitor quite literally. At the moment, connections to the CVPS Buffalo target are timing out. That doesn't mean CVPS Buffalo is "down". It might be, or there might be a network issues between the Chicago monitor & CVPS Buffalo. A second (or third) monitoring station would answer that only if it took a completely different route to the CVPS Buffalo target (triangulation). But it's impossible to control routing for something like this.
So my preference to keep the monitoring as it is -- a single, simple point-to-point test -- and then investigate manually when an issue crops up.
I understand, but a quorum between monitoring nodes would be a valuable plus
Can you explain the benefit it would add?
how about a main website that pulls the info from different pung slaves, then display in different tabs per location. i.e. it doesnt consolidate per ping. just show 1 tab for the node in canada. 1 tab for node in LA. etc
To limit false positive :-)
Say a cable like SEA-ME-WE4 breaks, this cable works in sections. We know Bangladesh only has access to this cable(they have backup satellite links) how ever if your in Bangladesh your opinion is the internet is failing you, as where as if you have a slave in (say India) you know that not true. (terrible example)
When you talking about IP traffic its vital you have Src/Dst and 3rd monitor to verify Src/Dst isn't broken.
People who sell this type of information
To me a false positive means that a test is reported as "OK" when it's not. The only way I can see that happening is by a bug in the pung app.
Maybe you mean false negative -- a test is reported as "timed out" when it's actually OK. To me, timed out means timed out -- remember the test is actually two attempts over a ~15 sec timespan (with a successful connection to google or twitter or facebook in between). So something went wrong. Still, I tend to discount single, isolated failures as insignificant / unimportant.
With IPXcore reporting up and being in the same racks, at most the machine you are testing for in Buffalo could have an issue, but ChicagoVPS Buffalo is defiantly up.
By "CVPS Buffalo" I meant the CVPS Buffalo target (IP:port), not all services provided by CVPS in Buffalo.
Yes, I'm looking this from the side of the allert point of view. I would like a "Huston, we got a problem" message only when double checked/confirmed. But don't want to insist, your point to keep it simple is valid :P
There just went my uptime. Still strong in Chicago though.
This is when you know pingdom is not always correct. It spotted a 5 minute downtime (Monitored every minute) last night, yet pung has reported no problems.
Whats the check interval @sleddog
5 minutes. So it's just possible it was missed
Oh sorry, i just looked properly. It was only 4 seconds so pung probably missed it.
How can pingdom detect a 4 second outage when it checks every minute....?
LOL. I'm ashamed. I read the graphs wrong again, i didn't get much sleep last night :P
Pingdom kept you up?
It disturbed me yeah, then just as i was about to get up i got the second SMS "hypervisor01 is back online". Doh.
Sleeps never the same when you get disturbed
So true...
BTW, on the pung page at bottom you can now set your local time offset (from UTC). "Last run" time at top will then be within 5 minutes of your local time, and the log will be in your local time.
I changed the log time format to a unix timestamp for easier manipulation, that's why there's a new log.
Cool
It surprises me how fast the page loads with so many checks, our status page can take forever sometimes, but thats probably because we have a PHP check for each service on the same page :P
Checks are done independently with a bash script. The PHP page merely reads a couple text files and formats the display. If you happen to load the PHP page while the bash script is in progress you'll see 'Running...' at top in place of 'Last Run'.
@sleddog,
Messaged you about having two of our Xen VPS nodes added to the list (in France and Germany).
Thanks exextatic. The list is comprised of hosts that have made offers here (or on LEB) and I'd like to keep that informal rule... maybe you'll consider making an offer?
Scheduled maintenance ruined our perfect record
@sleddog,
Waiting on Chief to post it on LEB, and for me to have a full 7 days registered here before I post it here
Thanks for sharing and I'm sure a lot of people will find it useful.
Deja vu... Got a different IP in Buffalo that responds on port 80?
Woop, looks like everybody got a reset. Let the uptime rivalry begin.
It's August now
Ah reset every month, im with you.