What do you look for in an external monitoring service? Thoughts on my approach welcome.

Timtimo13 · December 2017

in 20 minutes.

CreatePrivateServer · December 2017

Hmm. System checks like CPU, RAM usages and an application which will keep playing a sound when system is down so I can wake up and deal with the issue if I'm sleeping and providing 24/7 support.

Timtimo13 · December 2017

@CreatePrivateServer said:
Hmm. System checks like CPU, RAM usages and an application which will keep playing a sound when system is down so I can wake up and deal with the issue if I'm sleeping and providing 24/7 support.

Depending on your notification settings, you would get a notification when your webserver is down.
This would be mostly the same. As Mason don't wants to work with local clients, you will not be able to check CPU / RAM / disk (...) usage without changes on the client which is being checked and Masons support for the client.

If you would need this checks, check out Nagios

vimalware · December 2017

This looks exactly what I was thinking of building for myself (for the same reasons: python+flask skills) : External service monitoring from a quorum of POPs.

Not nitpicking, but I would have gone with Postgresql as RDMBS for a greenfield project in 2017.

PM a url to git or architecture wiki to see if it makes sense to contribute rather than build my own.

All the Best!

MasonR · December 2017

@vimalware said:
This looks exactly what I was thinking of building for myself (for the same reasons: python+flask skills) : External service monitoring from a quorum of POPs.

Not nitpicking, but I would have gone with Postgresql as RDMBS for a greenfield project in 2017.

Haven't decided on a particular database for the main node quite yet and wouldn't mind using psql as I use it quite extensively for a couple projects at work. The choice will probably come down between MariaDB and PostgreSQL.

PM a url to git or architecture wiki to see if it makes sense to contribute rather than build my own.

All the Best!

Cheers! Will do when I get the ball rolling more. So far the only thing decided is that the monitoring nodes will have a nginx -> gunicorn -> flask setup. And to try to make them as pluggable as possible so new extensions can be added easily. Got the basic skeleton in place, but wanted to get a couple of the modules banged out first (probably ping + http response) before adding to git.

mksh · December 2017

@Timtimo13 said:
Maybe implement SNMP ? hmm

Yes, sometimes there is little choice besides SNMP when it comes to monitoring stuff but voluntarily working with this abomination of a protocol? Please tell me you are joking.

To everyone shouting nagios. I think i've seen enough of nagios (icinga) to say he is better of designing his own solution from the ground up. Lots of room for a cleaner and nicer implementation there. Sure, he won't be able to avoid running some kind of software on the targets to monitor certain things but imo it won't be hard to come up with something that single handedly beats nsclient.

AuroraZ · December 2017

Can you pull the CPU and ram from something like htop?

MasonR · December 2017

@AuroraZ said:
Can you pull the CPU and ram from something like htop?

psutil would be able to grab that info to keep everything in Pythonland. Though for this project, I'd rather stay away from user agents and the like and just focus on an external monitoring system.

vimalware · December 2017

As long the 'runners' follow a http-based API, it lays the path for replacing the python bits with a Go binary, if anyone feels like it.

MasonR · December 2017

@vimalware said:
As long the 'runners' follow a http-based API, it lays the path for replacing the python bits with a Go binary, if anyone feels like it.

That's a good point. There'd be nothing preventing someone from implementing their own monitor, even in a different language, as long as all the restful interfaces are defined.

AuroraZ · December 2017

@MasonR said:

@AuroraZ said:
Can you pull the CPU and ram from something like htop?

psutil would be able to grab that info to keep everything in Pythonland. Though for this project, I'd rather stay away from user agents and the like and just focus on an external monitoring system.

I was just thinking most if not all Admins install it so the info might be easy to pull. Still have it as an outside monitor because you wouldn't need to install anything special. Was just an idea.

vimalware · December 2017

AFAI understand, original objective (and mine) was a Blackbox monitoring system in something other than PHP.

For whitebox monitoring, lots of solutions exist.

MasonR · December 2017

@vimalware said:
AFAI understand, original objective (and mine) was a Blackbox monitoring system in something other than PHP.

For whitebox monitoring, lots of solutions exist.

Precisely. Basically an open-source python-based uptimerobot.

kassle · December 2017

if you can combine with log analysis & alert, would be awesome.

there is loggly, logentry, etc. but the don't have uptime / ping monitoring.

OOT: monitoring ladies bathroom

rick2610 · December 2017

@kassle said:
if you can combine with log analysis & alert, would be awesome.

there is loggly, logentry, etc. but the don't have uptime / ping monitoring.

Perhaps zabbix

vimalware · December 2017

@kassle said:
if you can combine with log analysis & alert, would be awesome.

graylog2 maybe?

MasonR has a vision for a blackbox monitoring platform.

I'd rather see a tool that does one thing very well.

MasonR · December 2017

@kassle said:
if you can combine with log analysis & alert, would be awesome.

there is loggly, logentry, etc. but the don't have uptime / ping monitoring.

OOT: monitoring ladies bathroom

Unfortunately, that's outside of the scope that this aims to accomplish. The code that is produced here wouldn't be deployed to the machines that you want monitored.

kassle · December 2017

@MasonR said:

@kassle said:
if you can combine with log analysis & alert, would be awesome.

there is loggly, logentry, etc. but the don't have uptime / ping monitoring.

OOT: monitoring ladies bathroom

Unfortunately, that's outside of the scope that this aims to accomplish. The code that is produced here wouldn't be deployed to the machines that you want monitored.

i see, but with rsyslog (as major linux distro support this) no need to install extra application but extra config

IAlwaysBeCoding · December 2017

If you don't mind me chiming in then I would suggest you:

Use sanic instead of flask , it's basically a flask-like with asynchronous abilities.
For HA, try to use the Zookeeper library, trust me it does wonders. It is hard to use at first, but it will go farther than what you have described. I got a lot of help from the Netflix zookeeper recipes when I started using it.
Use Celery to distribute your workload across multiple workers, and do not use a Redis as a broker go for RabbitMQ.
Last but not least, I would try to look into using Go instead of python. I know you want to sharpen your python + flask skills. However, in 2017(almost 2018) Go is the king of the hill for these kind of apps.

Good luck bro, I hope you succeed and I will be waiting to take a look at that source code.

MasonR · December 2017

@IAlwaysBeCoding said:
If you don't mind me chiming in then I would suggest you:

Use sanic instead of flask , it's basically a flask-like with asynchronous abilities.

Sanic looks nice and might eliminate the need for gunicorn since you can spawn multiple workers. Async is definitely a huge plus.

For HA, try to use the Zookeeper library, trust me it does wonders. It is hard to use at first, but it will go farther than what you have described. I got a lot of help from the Netflix zookeeper recipes when I started using it.

I'll definitely look into Zookeeper as well -- being a complete noob to HA, I'll probably have to fiddle with a few different options out there.

Use Celery to distribute your workload across multiple workers, and do not use a Redis as a broker go for RabbitMQ.

Added to the list of what to look into

Last but not least, I would try to look into using Go instead of python. I know you want to sharpen your python + flask skills. However, in 2017(almost 2018) Go is the king of the hill for these kind of apps.

Yeah, not a bad idea. I think my initial pass (at least for the monitoring nodes) will be to use Python as that's what I'm more comfortable with. But since it'll all be API driven, as a Go exercise, I may rewrite the monitor in Go once things are up and running

Good luck bro, I hope you succeed and I will be waiting to take a look at that source code.

Cheers, I really appreciate your input!

MasonR · January 2018

Just a quick update --

I've finally made some progress. The code for the monitoring nodes is pretty much good to go. All can be viewed in the git repo here:

pyPatrol

Types of monitoring checks implemented:

status - Returns status of the pyPatrol node
ping - Pings (via IPv4) a specified IP/hostname
ping6 - Pings (via IPv6) a specified IP/hostname
http_response - Checks the HTTP response code of a given URL
cert - Checks if an SSL certificate is valid or will expire within a specified threshold
tcp_socket - Checks if a specified IP/hostname and port are listening for connections (TCP)
steam_server - Checks if a Steam Server running on a specified IP/hostname and port is online

I tried to put a good amount of effort/time into creating worthwhile documentation and coding structure. Hopefully everything is readable and easy to follow -- happy to readdress if not the case.

RESTful API interfaces documented here.

I'm also using Sanic's built-in unittest harness to make sure all the endpoints function and return the right response codes when passed certain data.

Next on the todo list:

Design database (probably using MariaDB or PostgreSQL)
Write up job dispatch service that polls the database for jobs that need to be run (i.e. ping checks, http_response checks, etc.)
- Will likely be another Python-based service that uses a Redis Queue
Start working on web front-end
- The last piece will be tying everything together. End product should a simple and slick front-end

AuroraZ · January 2018

Where is the Yeti checker? Why do you always forget the poor Yeti?

Looks like it might work out nicely. I like the steam feature and may use this for that feature alone.

IAlwaysBeCoding · January 2018

I don't want to nitpick you, but what the hell is wrong with your editor. Why is your indent 8 spaces?

Your code looks like a go style not really a python, heck google uses 2 spaces as a normal indent, but usually everyone uses 4 spaces as indent for python. You used like 8 spaces as indent.

WSS · January 2018

He uses actual tab.

AuroraZ · January 2018

MasonR · January 2018

@IAlwaysBeCoding said:
Why is your indent 8 spaces?

No idea. To be honest, as long as the code ran correctly and looked somewhat clean, that was good enough for me. You'll also probably find that the space indents aren't consistent between files since I did half in vi and half in sublime :P

E: May go back and clean everything up a bit more + add more comments when it's prime time.

Howdy, Stranger!

Categories

In this Discussion

What do you look for in an external monitoring service? Thoughts on my approach welcome.

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

What do you look for in an external monitoring service? Thoughts on my approach welcome.

Comments