New server monitoring tool, feedback desired.

jsg · February 2019

Hello

I've just finished the creation of the core of a new monitoring tool and I got permission to offer it as freeware.

Background:

The client has many servers, some of which are critical for his business. Unfortunately most available server monitoring tools do not meet his requirement and/or are way too big and presentation heavy (lots of graphics).

The client needed a small, simple, and reliable monitoring solution which is strong on providing a simple and clear "green, yellow, red" (OK, soon or potentially problematic, and failure/requiring immediate attention) overview for up to about 100 servers (then the problem is screen estate, not the monitoring engine).

What I created has a monitoring client on each server (very small, much < 100 KB) which stores the data locally. That client is also very light because it gets all its data directly from the kernel and can hence easily run on, say, a 128 MB VPS. It also processes and cumulates the data at a configurable frequency (e.g. every 15 Minutes) rrd like and either stores those results locally and/or pushes them to a central server where typically more (also configurable) processing/cumulation is done and then pushed into a store. At the clients company there will be a small "observer" that a) shows alarms (e.g. disk space getting too small on server xyz) and b) allows to look at the details/data. The communication between the machines is obviously encrypted.

While that monitoring solution obviously does collect, process and show all the usual data one additional and important feature for the client is to have a "1 glance" overview of the overall situation. In fact it's possible to only get alarms (and optionally warnings) on the screen (I'll develop a linux and Windows observer client later. Until then only cli).

For each server the collection frequency as well as what is monitored can be configured.

Example: Probe memory, network interfaces, and processes ssh and httpd every 15 sec, probe proccesses mysqld-server and ntpd once every minute, check free space on all disks every 15 min. Process and cumulate all probe data (with finer granularity) every 15 min. and push the result to the collection server.

Additionally, for most probes a "yellow" and "red" value can be provided, e.g. if the free disk space is below 20 GB mark it as "yellow" (warning/needs attention), if it's less than 5 GB, mark it as "red" (urgent/alarm). Another example would be to mark "httpd" as red if it stopped running. Also additionally checks from the collector -> clients for some protocols (e.g. http, dns) are planned.

In short, the client's priority was to have a small and light solution and to have an easy 1 glance overview showing any problems. if there is a problem, there are of course also all the usual data, rrd style (both of which most solutions do not provide although they are fat MB services).

Here's the question for you:

The client only uses FreeBSD servers. I think, however, that it shouldn't be too complicated or too much work to port it to Linux.

But is it worth the effort? Are there many people here who would love that kind of a versatile and small and light monitoring solution?

Maybe, I guess, it's mostly attractive for professionals running many servers, who would love to have that "seing for all their servers if there is any problem (and if so which one) at 1 glance" feature; for people having but one or 2 dedis or VPS it's probably less attractive and they (I guess) might prefer the usual heavy "many funny gauges and graphs via a web server" approach.

Don't know. But I think it would be sad to keep such a tool just for oneself (and the client did allow me to offer it for free).
If there is enough interest I'll make the effort to port it to linux. I'm asking a bit early mainly because given enough interest I'll try to keep portability in mind while I complete the work.

Let me know.

eol · February 2019

Yes - but for Linux (not PoetteringOS), please.

jsg · February 2019

@eol said:
Yes - but for Linux (not PoetteringOS), please.

On FreeBSD pretty much all probes are taken via sysctl. On Linux I'd probably also used /proc. Don't worry, I certainly ignore systemd to the max. possible extent.

uptime · February 2019

+1 for lightweight overview ... anticipating a minimalist ncurses esthetic

EDIT2: won't hold my breath waiting for any node.js port ...

vimalware · February 2019

Good to have LE options to Prometheus.io node_exporter.

Vinnyletje · February 2019

@jsg said:

@eol said:
Yes - but for Linux (not PoetteringOS), please.

On FreeBSD pretty much all probes are taken via sysctl. On Linux I'd probably also used /proc. Don't worry, I certainly ignore systemd to the max. possible extent.

Just asking why you think systemd is bad? I actually have no clue about it.

jsg · February 2019

@uptime

Well, to be honest, the tool isn't much about aesthetics and with "overview" I mostly mean something like - in the minimal and hopefully normal case - a taskbar icon that's green (meaning: all your servers are running fine).

That "observer" user interface will be able to run on Linux, FreeBSD, and Windows. But agains, that's just the end user part, the software for the admins.

Next, typically, if it shows yellow (warning. "better have a look") or even red ("Alarm! Something on some server is critical"), one could click on the icon and get a list overview with all the servers. In that overview one would see all servers (by a chosen name), each one with a green, yellow, or red signal. If yellow or red one would also see a short problem description. I'll come back to that a bit later.

The "beef" though is not the interface thingy. The beef is a) the (really small and light) clients and the (also small and light) "collector" which would be on a server, too (typically at the admin site) and collects and stores all measurement and test data from all client servers beyond the finest "rrd" level (e.g. 15 sec). Additionally the controller can perform service tests (like "is http working on the clients?") and store those results, too.

The user interface I talked about at the beginning of this post is just one kind of arbitrary choice because that's how my client wants it. But the collector server software will have an interface that will allow quite versatile alternative user interfaces. Like a simple "htop" like ncurses thing (which I might even create myself because that's something I like too).

Now, to better understand the mechanism:

Each client (the monitor software on all the servers to be monitored) can grab/get

free memory
load (like the one "top" shows)
free disk space on all mounted disks (incl. tmpfs)
network interfaces data
processes whose most relevant data are collected/observed
plus: any alerts get stored locally and pushed right away to the collector (who then pushes them right away to the observer).

and all of that can be configured as needed. In particular one can set different time granularity for each probe (e.g. every 15 sec), yellow and red levels, etc.
Plus at a given time interval, say every 15 min, the client accumulates/processes those probe data and logs them - but it also pushes them to the collector server which then accumulates/processes (rrd like) those probe data for all clients again at different levels (like "keep 15 min data for two days, then process/accumulate daily data for a month, then process/accum. monthly data for a year, then ...).
And all that will be accessible through an interface. One can process that into funny graphics, push it into a web interface, etc, just like with other monitoring tools but that just isn't the priority here.

Why? - priorities:

We (my client) have no interest in funny gauges and lots of colours. What he needs is a practical way to stay on top of a mid size (dispersed over some locations) array of servers.
The priority is to see as soon - and as simple and clear as possible - if there is a problem anywhere. If, say, server 121 does have a problem we want to see that as early as possible.
If there is a problem we also want to see the relevant data (e.g. "server 121 - disk: free space critical at below 5 GB on drive sdc2").
same for services (ssh, http, ntp, ...)
no bloat, no wasted resources on the clients -> light monitoring software and small storage load. Plus: if one client goes down we still have data on the collector that helps to understand the problem.
Security. This is critical/sensitive data and we wanted it to be encrypted and securely transmitted (that was the basic point why they brought me in).

If one still wants funny reports with lots of gauges and diagrams one can get that; the data are there. But that wasn't the priority.

Afaic there will never be a nodejs interface. The will be a (more or less) nice gui for the major OSs and there will quite probably be an ncurses TUI. What's more important than funny interfaces (re the stored data) is a good database like query interface.

And again: My major reason for this thread is the question whether there would be enough interest for a Linux version.

uptime · February 2019

well that paints a pretty clear picture and if stays minimalistic that sounds good to me..

I'd like to see it on linux but if it's already in good shape on freeBSD it may make sense to see what happens if you set that code loose "as is". Maybe it will get ported to (the myriad flavors) of linux without too much more specific effort. Unless that's work you want to do more on your own?

Anyway, +1 for linux. (ncurses and Debian, thx.)

jsg · February 2019

@uptime

Yes and no. Please note that I said it's "freeware". It is not open source. I already had damn enough discussions to get my client to agree to at least allow me to give it away as freeware. The client is not open to even discussing open sourcing but frankly, I can't complain. Most clients wouldn't share anything for free, so that client is already quite generous.

As for "Debian", nope I don't think I'll offer distro packages. Simple reason: it's not worth it as it's just one (and small) binary and a sample config. Docu will be available for download or online viewing and installation will be as simple as (extract and) copy to /usr[/local]/bin and etc for the config, done.

eol · February 2019

@Vinnyletje said:

@jsg said:

@eol said:
Yes - but for Linux (not PoetteringOS), please.

On FreeBSD pretty much all probes are taken via sysctl. On Linux I'd probably also used /proc. Don't worry, I certainly ignore systemd to the max. possible extent.

Just asking why you think systemd is bad? I actually have no clue about it.

It's even more than just bad.
Do your homework.

uptime · February 2019

Please note that I said it's "freeware". It is not open source. I already had damn enough discussions to get my client to agree to at least allow me to give it away as freeware. The client is not open to even discussing open sourcing but frankly, I can't complain.

ok, that's an interesting wrinkle.

I'm just wondering ... in general, would you recommend your client run closed-source freeware on their own systems?

jsg · February 2019

@uptime said:

Please note that I said it's "freeware". It is not open source. I already had damn enough discussions to get my client to agree to at least allow me to give it away as freeware. The client is not open to even discussing open sourcing but frankly, I can't complain.

ok, that's an interesting wrinkle.

I'm just wondering ... in general, would you recommend your client run closed-source freeware on their own systems?

Aarg, that's a rather political question and a mine field.

Let me begin my response with a typical reaction one got 20 years ago when offering some open-source code -> "Nuh, that can't be good quality. That's just some hobbyists spare time fun product".

In other words: That issue is by far more to do with social and political criteria and what's considered to be "the right way" at any given point in time and any given society than with tech. factors.

Looking with cold pragmatical eyes I can't offer a generally valid answer. Both, companies and the open source crowd have created lots of crap and both have created some good stuff.

Look at it this way: I create that software, no matter whether payed for or just for the fun of it. The software I produce isn't better or worse just because I write it for money or for the fun of it.

Broadly speaking open source software tends to be more "egotistical" and less user oriented because, let's be honest, it's more often than not created to scratch a particular itch. Payed for software tends to be more user oriented because otherwise it doesn't sell - with a big caveat: At large corporations that's often driven to an unhealthy extreme by marketing with little care for tech. quality. Which btw. is why I generally strongly prefer smaller companies' software and try to stay away from MS, Apple, etc.

And there is another factor, that's also true in this given case: A paying customer tends to check a software much harder than someone who downloads some foss software. After all it's important for their business and they pay for it so they properly check. That also means that a developer can get away with far more crap with foss software than with payed for software.

What do I generally advise? Neither. What I advise is to (a) avoid black-and-white perspectives, (b) to differentiate, and (c) to see the full picture; example: What are your rights? With foss you usually have none beyond "well, edit the source". As a paying customer you can demand problems to be solved (again, exception: large corps.).

Finally: This whole thing here is- at least for my part - purely good will. Frankly, I couldn't care less whether anyone uses "my" monitoring solution; I get/got payed for my work, done. But while developping it I noticed that I myself would actually love to have something like that too, and then a bit later I thought I'd discuss with the customer because it would be nice if others could get it too. Next I thought "hell, only few use FreeBSD and most use Linux. Before making the effort to port it, let's find out whether it's worth it".

uptime · February 2019

"let's find out whether it's worth it".

well - I hope my feedback was helpful - though maybe I'm not exactly a target market.

But my GNU/linux happy headspace tends to revolve around open source. (I don't run anything but open source if I can help it.)

EDIT2:

Good luck with your system monitor development

It sounds like an interesting project.

jsg · February 2019

@uptime said:

"let's find out whether it's worth it".

well - I hope my feedback was helpful - though maybe I'm not exactly a target market.

But my GNU/linux happy headspace tends to revolve around open source.

Good luck!

Thanks for your feedback!

Oh, and: I have no "market". If my good will to invest some work to make freely available a tool I consider quite attractive and to even port it to Linux, if there's interest, doesn't find enough interest that's no problem for me.

I'm earning my income anyway and I'll use that tool anyway. If others don't, oh well.

I just checked: atop, vnstat (both considered very small) execs about 200 KB. monit (quite trendy) ~ 900KB, nagios ~600KB - my tool << 100 KB -and- lighter on the processor. Plus none of those afaik alert you by themselves; you have to dig/click through the interface.

If on mine, say, free space on disk sde2 on server "srv 123" gets below a configurable value it'll alert you with a message like "ALERT (red): disk sde2 on srv 123 is below [some configured value]".
Or, another example, if the dns servers (in etc/resolv conf) or the ntpd servers aren't reachable you'll get a message like "Warning (yellow): time server ntp.example.org failing to respond".

Btw: When did you last seriously look at (as in "check") at your kernels or at the openssl or nagios' source? And why are there so many - sometimes critical - bugs discovered in foss software, often only after years? That's what I meant above: Quite many of the oh so valuable rights with foss are 99.9% theoretical. The famous "thousand eyes" simply don't look.

That said, I also highly value open source but I'm fully aware that "open source" != good quality.

eol · February 2019

@jsg said:
Btw: When did you last seriously look at (as in "check") at your kernels or at the openssl or nagios' source? And why are there so many - sometimes critical - bugs discovered in foss software, often only after years? That's what I meant above: Quite many of the oh so valuable rights with foss are 99.9% theoretical. The famous "thousand eyes" simply don't look.

A lot of them are placed deliberately by questionable individuals.
You know why.

NobodyInteresting · February 2019

I need this in my life. If it supports Ubuntu 16 and 18 - I will be jumping on board ASAP.

jsg · February 2019

If I port it to linux (depending on enough interest) it should work on any linux and have very modest dependencies.

jsg · February 2019

@eol said:

@jsg said:
Btw: When did you last seriously look at (as in "check") at your kernels or at the openssl or nagios' source? And why are there so many - sometimes critical - bugs discovered in foss software, often only after years? That's what I meant above: Quite many of the oh so valuable rights with foss are 99.9% theoretical. The famous "thousand eyes" simply don't look.

A lot of them are placed deliberately by questionable individuals.
You know why.

Of course - but that wasn't even my point. No, forget about nsa and Co. and just have a plain look: Pretty much nobody is checking foss sources. One major reason that somehow seems to be Voldemort ("one doesn't speak about it") is the simple fact that by far most open source users have other reasons than the official (rather political) ones like freedom bla bla. Nope, people mainly use foss because it's free - as in "beer". And I understand them. Why should you pay hundreds of $ every 2 or 3 years for MS Office as a private person? Of course most people prefer installing libre-office.

Another point that is rarely mentioned and fully understood is: You have NO rights, none, nada, zero, other than a lot of socio-political bla bla that means little to nothing for most people.

In summary that means that as an end user you almost always get "valuable" assurances like "the project is supporting diversity", "the project highly values true freedom", etc - but you don't get anything even close to "they care about you" and "they truly care about quality", let alone you having rights that are meaningful to you.

The way I see it, people were wrong 20 years ago when they considered open source software as sh_tty and hobby level toys - and today they are wrong when they consider commercial software, even free ones, as somehow evil and something to avoid if any possible.

Hell, my client payed for that software to be designed and developped - and trust me, he does care a lot about quality and safety (there are much cheaper devs than myself out there ...) and he was generous enough to give in to my nagging to make it available for free. But he doesn't offer the source code HE payed for, that evil evil man! So quite some people would rather prefer foss software that is worse and whose source they'll never look into, because, you know, it's "really free and open" ...

eol · February 2019

Yeah.
Also closed source needs to be shitty for money (support contracts, updates/upgrades in the hope bugs and "bugs" might get fixed, etc.).

Vinnyletje · February 2019

FWIW I think it would be perfect to run on anyones small vps even if it is not mission critical. I like that it is lightweight.

jsg · February 2019

@Vinnyletje said:
FWIW I think it would be perfect to run on anyones small vps even if it is not mission critical. I like that it is lightweight.

Yes. And that was a high priority from the beginning. The other priority was warnings and alarms rather than many gauges and diagrams. Reason: What probably most really are interested in is "Do my dedis and VPSs run and work fine or is there any problem?". Them diagrams IMO are pretty much sales-mind driven. "Give the masses colours and funny gauges and diagrams. They like that". What we actually though is "are we within bounds wrt traffic?", "is the network traffic OK?" or even is the ratio consumed memory, proc. power, network speed in a healthy relation - but for that you don't need graphics, for that you need data to compute.

Letzien · February 2019

I don't doubt that there would be many people who would be willing to take your product and help port to other systems.

The one request I would have will be that you abstract your calls as they are now so it'll be easier to just make a plug-in for whatever operating system it ends up on.

The fact that you said your willing to make it freeware and not a BSD license- is that an oversight or by design?

jsg · February 2019

@Letzien said:
I don't doubt that there would be many people who would be willing to take your product and help port to other systems.

I'm not even asking for help in porting. I'd do that myself. I just need to find out whether there's enough interest to warrant the effort.

The one request I would have will be that you abstract your calls as they are now so it'll be easier to just make a plug-in for whatever operating system it ends up on.

The fact that you said your willing to make it freeware and not a BSD license- is that an oversight or by design?

Neither. It's by dictum of the client who pays me and for whose company I have created that solution. I managed to nag him enough to give in and to allow me to give it away for free to others but he (understandably) insists that the source is his and not shared.

Letzien · February 2019

@jsg said:
Neither. It's by dictum of the client who pays me and for whose company I have created that solution. I managed to nag him enough to give in and to allow me to give it away for free to others but he (understandably) insists that the source is his and not shared.

That's neat, but how many folks that go out of their way to run open source would be interested in a completely closed source solution for what is essentially a fairly simple product? It isn't 2001 anymore. I wish you luck with this, but I'm out.

datanoise · February 2019

jsg said: The famous "thousand eyes" simply don't look.

That's true for some complicated things (crypto, drivers, etc) but the kind of tool you are developing would probably be reviewed by some folks if it was open source. Look on github, many small / mid sized projects have many contributors who did actually look at the code, to be able to write an improvement.

Probably not worth spending too much time porting that to linux. Sure, there might not be enough eyes watching the code of sensitive parts of GNU/Linux systems (and the problem seems even worse with the BSDs, even if the situation appears somewhat better with OpenBSD), but who would want to make their situation potentially worse, running an unknown binary on their server when open source / trusted alternatives exist?

jsg · February 2019

@datanoise said:

jsg said: The famous "thousand eyes" simply don't look.

That's true for some complicated things (crypto, drivers, etc) but the kind of tool you are developing would probably be reviewed by some folks if it was open source. Look on github, many small / mid sized projects have many contributors who did actually look at the code, to be able to write an improvement.

Probably not worth spending too much time porting that to linux. Sure, there might not be enough eyes watching the code of sensitive parts of GNU/Linux systems (and the problem seems even worse with the BSDs, even if the situation appears somewhat better with OpenBSD), but who would want to make their situation potentially worse, running an unknown binary on their server when open source / trusted alternatives exist?

Do what my client did - look around. Really, closely, professionally, critically. And you'll see that there are not"many" tools around that have a certain feature set and a certain quality. And btw. it's also not that easy. True, it is easy, if one does what most do and writes ignorantly only for Linux and uses reading /proc as base.

trusted? Probably different people have different definitions of "trusted". You and many others seem to have mainly one criterion for trust: "is it open source?". Others have other criteria, like e.g. "is it reliable?" and "is it low bug count?".

This story is simple. I'm creating a good tool that might be quite interesting for many here. I "fought" for you all and nagged my client; he gave in and allowed me to give it away for free. ... And now that discussion is more and more taken over by open source fans who, pardon me, are mostly based on a belief system rather than on facts.

Sorry for being very frank, but I don't care. You insist on open source? Go and find what you like and what you consider "trusted". Afaic, I will certainly not waste more time on Foss/License/1000 eyes debates. I made an offer, a friendly one and I did make an effort for the community - take it or leave it.

Letzien · February 2019

So when this thread turn from praise to a bit of subjective critique you decided to put yourself straight up on the cross?

I'm sorry but if this is the way you react when you don't get accolades, perhaps you should offer it on Tumblr.

jsg · February 2019

@Letzien said:
So when this thread turn from praise to a bit of subjective critique you decided to put yourself straight up on the cross?

I'm sorry but if this is the way you react when you don't get accolades, perhaps you should offer it on Tumblr.

It's not about critizism (based on what btw? Nobody saw the tool so far).

It's about my not at all being interested in yet another "only open source can be trusted" and similar belief system argument.

In other words: I have no problem whatsoever with anyone not liking my tool due to his preference for foss software. But I will not be pushed to accept as facts what is hardly more than a belief system, and one at that that has been proven wrong often enough.

To cut this short before it escalates -> I take my offer back.

Probably I will port it to Linux but I will not give it away for free, except for a few reasonable people who ask me for it (here or by PM; both is fine with me).

Have a nice day everyone.

datanoise · February 2019

jsg said: trusted? Probably different people have different definitions of "trusted". You and many others seem to have mainly one criterion for trust: "is it open source?". Others have other criteria, like e.g. "is it reliable?" and "is it low bug count?".

That's not what I meant, but trusted AND open source, while your - obviously great - software doesn't seem neither trusted (by several people who decided to investigate it) nor open source. BTW, if you release your stuff now, I'm not gonna trust it just because you say on LET that you are a good guy (even though you seem to be a good guy).

Some people might want to investigate (what information is sent on the network? What does it really do to my filesystem? What privileges does it needs? etc) before deciding to run this stuff 24/7 on their production machines. Then, they might share the result of their experiments. Some other persons read that, do some more testing, etc: it takes some time to have your closed source software somewhat trusted!

You wanted answers, I gave you an honest answer. Why are you upset, shouldn't you be happy and thank all of us who took some time to answer you?

Read a bit how many people reacted to let's encrypt clients in the early days, and you'll notice that many people don't want to run untrusted software on a webserver. An option to trust something - of course not the only one - is to be able to read its code if you can understand it. It seems like many people could understand what you wrote, but won't have the opportunity to do so. They might not be able to read/understand all of nginx code but they trust the signed binary they got from their distribution. Unfair? Absurd? I don't think so.

Release it as you want, or keep it for you and your client, as you prefer! But please don't complain when some strangers from LET tell you that some people would prefer if it was open source: that's a fact, and probably not a sad one.

jsg said: is hardly more than a belief system, and one at that that has been proven wrong often enough.

Have fun running untrusted binaries from unknown internet strangers on your servers!

servertrading · February 2019

I prefer to use one of the most known and which has a great community. Open source I think is always better in my opinion. So I prefer to use one that I already know rather than trying something which is pretty new.

eol · February 2019

@datanoise said:
Have fun running untrusted binaries from unknown internet strangers on your servers!

Yes and no.

Howdy, Stranger!

Categories

In this Discussion

New server monitoring tool, feedback desired.

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

New server monitoring tool, feedback desired.

Comments