New server monitoring tool, feedback desired.

Creling · February 2019

@jsg ‘s tool sounds interesting, and in fact I dont care if it is open source or not.
If it is open source, that's good, I will use it for some of my key servers.
If it is not, never mind. I will still use it for some of my servers, and I will keep an eye on the server log. anyway, it is/sounds like a effective tools, isn't?

No matter it is open source or not, we can find a way to make use of it, and then thank jsp.

NanoG6 · February 2019

So when the download link available. I have voted.

vimalware · February 2019

@jsg is your benchmark script open source?

Link?

angstrom · February 2019

Just a reminder to some (many?) here that 'free' not= 'libre'.

angstrom · February 2019

@angstrom said:
Just a reminder to some (many?) here that 'free' not= 'libre'.

I can offer you a free beer, but I'm afraid that I can't offer you a libre beer.

jsg · February 2019

@datanoise said:
Some people might want to investigate (what information is sent on the network? What does it really do to my filesystem? What privileges does it needs? etc) before ...

Those issues can be investigated with closed source software, too.

I do that quite often and at least on Unix systems that's no problem, we have the tools needed.

You wanted answers, I gave you an honest answer. Why are you upset, shouldn't you be happy and thank all of us who took some time to answer you?

Thank you and others who responded,

I'm not upset. I'm just not enticed enough any more to keep my offer alive.

jsg said: is hardly more than a belief system, and one at that that has been proven wrong often enough.

Have fun running untrusted binaries from unknown internet strangers on your servers!

Pretty much all of us do exactly that - each and every day.

There is a lot of politics and "religion", along with a lot of unreflected opinions and simple lack of knowledge in that whole area.

For a start, the foss "guarantee" is not "it's reliable and safe". The "guarantee" is "you could check it" - if you can. Most can not. Another major wrong premise is that software can only be checked if it's open source. That is plain wrong. In particular closed source Unix software - but with some determination and effort every software - can be checked. And in particular the set of typical concerns like "does it call home?" etc can be checked, and in fact with not much more effort than open source software.

The reason for my pulling back my offer is not that I'm upset. It is the experience that discussions with convinced believers of religious-like systems are very demanding and exhausting yet hardly promising. And btw, why should I be interested in preaching and teaching the believers? To be allowed, if successful, to give them my tool, my work for free?

So in the end I accept the "foss akbar, there is only one true safety and that is foss" chanting and go on to meanwhile work to actually create useful software or to actually check software. There I meet the "foss akbar" crowd again but this time in reverse, when they are told that some foss software is crappy and unsafe/insecure; but this time, of course, they defend with determination what they just complained about before, because, you know, open source software can't possibly be crappy or unsafe (no matter the objective test results) because, hey, it's open source, foss akbar, foss akbar!

Btw the rights needed for that kind of software depend on the OS and not on the software.Funny side note: The higher you drive up the security on a given OS the more rights a monitor needs. That is however only a quite minor concern because there is no dynamic input.

@vimalware said:
@jsg is your benchmark script open source?

Yes. It's a program btw, not a script.

Link?

My sig.

vimalware · February 2019

Could you upload it to a simple HTTP webroot?
After spending two mins figuring out which javascript domains to whitelist in umatrix, I gave up.

Yandex doesn't have the same mental trust score (for me) that Google Drive has.

jsg · February 2019

@vimalware said:
Could you upload it to a simple HTTP webroot?
After spending two mins figuring out which javascript domains to whitelist in umatrix, I gave up.

Yandex doesn't have the same mental trust score (for me) that Google Drive has.

Sorry but I feel similar re Google. I do not trust them any further than I could throw a large nuclear bomb.

But I've set up a small web server to serve the vpsbench tar.gz (a copy of what's on the Yandex drive) for you. I'll PM you the link right after posting this.

datanoise · February 2019

jsg said: Pretty much all of us do exactly that - each and every day.

Not exactly. If a binary is gpg signed and also used by many big companies, it's pretty different that if it comes from a random stranger on the internets. That's the reason why many people prefer to install software from their distribution's repository.

jsg said: And btw, why should I be interested in preaching and teaching the believers? To be allowed, if successful, to give them my tool, my work for free?

No need to preach, no need to give "them" "your" tool. If you want to put it on the internet as non-free software go ahead, some people will probably enjoy it. A .tar.gz and a minimal information page is all what's needed. But remember, you came here to ask another question than the ones you now ask us:

jsg said: But is it worth the effort?

In some people minds it's not worth the efforts if it's not going to end up free software. That's an opinion that you obviously don't share. Maybe you didn't even need to ask, given what you seem to think of people having a different opinion than yours.

jsg · February 2019

@datanoise said:

jsg said: Pretty much all of us do exactly that - each and every day.

Not exactly. If a binary is gpg signed and also used by many big companies, it's pretty different that if it comes from a random stranger on the internets. That's the reason why many people prefer to install software from their distribution's repository.

"big company" also means quite some really bad things and by no means a statement of quality. See, for example, Microsoft DCs going offline.

Similarly "GPG signed" can carry lots of weight, some weight, or none at all, depending on ones view.

At the end of the day it's simple: trust is social thing and not a technical one and there have been plenty cases where long trusted people and institutions f_cked up, be it intentionally or not.

The mechanism to check whether some program, for example, calls home or not is quite the same for Google or Microsoft software as for open source code from trusted(?) sources as for a program "from a random stranger on the internets". And it's a technical mechanism and not a social one.

Using those technical tools leads to knowing rather than believing (as in "trust") - but the real problem is one the open source fans love to ignore: Those tools aren't used by 99.9+% of users - just like 99.9+% of users never even superficially glance over the source code. In fact, it's not unheard of that very much trusted people (comparable to saints, it seems) who were officially in charge of providing at least 2 eyeballs of the famous "1000" did not properly look at the source.

Sorry, the reality is simple. Some people like you choose to - quite arbitrarily - trust open source based on little more than a belief system. The real difference between them and the average user ignorantly clicking on and downloading pretty much everything that looks interesting is that those people are openly ignorant while many of the open source believers are factually just as ignorant but hide that (and calm themselves, I guess) with lots of "open source is safe" theater.

Both groups - as well as most developers - do not do what's the only really proper thing to do. Both do not use the available tools to technically and properly check the software.

That said: you prefer to trust open source software? No problem, I certainly won't try to evangelize you. Just go ahead; I wish you good luck. Honestly. But kindly come well equipped with proper arguments before implying that the closed source people are somehow evil or less trustworthy.

datanoise · February 2019

jsg said: many of the open source believers are factually just as ignorant but hide that (and calm themselves, I guess) with lots of "open source is safe" theater.

The question is not only open source vs closed source. It's also about trusting where you get your software from. Some people trust debian repos, other will prefer FreeBSD ports, using emerge or whatever tool the system they choose to install provides. You're free to believe that it's as insecure as downloading random shit that looks cool from the internet, but honestly the chance to encounter malware is clearly lower in one of those two practices, and even more so for a simple user that's not gonna check anything, and you know that!

That doesn't mean that this solution is 100% perfect, but that running stuff from random LET strangers isn't necessarily appealing. It can be an answer to your (first) question.

jsg said: implying that the closed source people are somehow evil or less trustworthy

Never said anything against them!

I wish you good luck with your projects. And happiness in your life. Cheers!

jsg · February 2019

@datanoise

After seeing the insane dependencies of many official Debian, FreeBSD, etc. packages I've taken the decision - and made quite some efforts - to create tar.gz with a few files software "packages".

I've just had damn enough of installing e.g. Perl just because some idiot chose to use Perl to create/adapt a couple of lines in a config or build file. Please note that this also has security implications because the safest software is the superflous additional one you don't drag along.

Whatever. Where I come from we have a saying "cat like mice, I don't". As long as you openly state or at least accept that your position is largely belief-system based I'm fine with that although my taste is different. What I really dislike though is when open source proponents are totally biased and see only the positive sides of foss, up to the point of making baseless assertion e.g. wrt to safety.

Btw, your chance to get a significant problem with software from me is way lower than with software from Debian or FreeBSD, because I do put quite some effort into what I - unlike most package maintainers at distros - consider a high priority: safety and security. That start with a well based choice of language and doesn't end with extensive testing, debugging, verifying

I of course did have occasional bugs during many years of developing, but to the best of my knowledgeI never had a critical bug creating a security risk. But of course one is also to see in all fairness that a distro just can't possibly do the necessary testing and analysis for the thousands and thousands of packages they include (nor can they throw out all crap; if they did their distro would be next to unusable).

ricardo · February 2019

Maybe someone can create a 10 line bash script to periodically report a server's health, then all the philosophical dilemmas can be put to bed. You might need some branching for different flavours of OS.

techhelper1 · February 2019

It's more than a 10 line bash script to get an extensive output you're looking for along with the logging.

https://gist.github.com/techhelper1/1b640a81ded45bfbac564d2fd4f9532c

The above script does most of @jsg 's program except a separate script needs to be created for the syslog handling, and however you want to handle the reporting to a collector box is there, the rest should be self explanatory.

Requirements:

Python 2 - All my work is written in v2.
psutil (via pip or distro package manager)
requests (via pip or distro package manager)

LibreNMS, Nagios/Sensu, or Telegraf + Grafana, would be the best ways to go about it instead of reinventing the wheel. Either way, creating such a program/script is not that hard.

Neoon · February 2019

@techhelper1 said:
LibreNMS, Nagios/Sensu, or Telegraf + Grafana, would be the best ways to go about it instead of reinventing the wheel. Either way, creating such a program/script is not that hard.

Well, thats bullshit.
It depends on what you need, netaData for example, is a bit of an overkill for me.
Has a lot of dependencies, not to speak about Grafana....

Recently netData the jumped on the cloud hype shit train, that was enough to start creating my own monitoring.

What you can do with psutils + requests is writing a single small script, which grabs mostly 90% of what you need, enough now.

jsg · February 2019

@techhelper1

Clearly no.

Some reasons:

Your code is basically just a consumer of psutil.
psutil - a nice and useful lib, no doubts - is quite large (and does much more than needed here)
Python has become a seriously big fat snake. Don't get me wrong, I do like Python, but it's by far too fat for what I had in mind.
Your solution is yet another case of "collect, collect, collect! The more the better".
Your solutions brings some direct and lots of indirect dependencies with it.

We didn't want a solution that requires multi MB memory and pulls in many and tens of MB of dependencies and consumes lots of resources. We didn't want bloat and mindless throwing libraries, dependencies, and resources at the problem and even becoming part of the problem - we wanted a small, single (or max a couple of) file tool that needs no installation or package manager, has a small footprint in every sense, and does its job well on even a very modest VM.

Either way, creating such a program/script is not that hard.

Uhum, I see. That's why you use psutil to do the hard part for you ...

Plus, your comment clearly shows that you didn't even get what my project is all about, why my client did not want to use of the many available tools.

techhelper1 · February 2019

@jsg

jsg said: Your solution is yet another case of "collect, collect, collect! The more the better".

Dude, it was written very quickly to shut people up, and that if someone wanted A solution (note: didn't say it was THE solution), it's their. Any user can modify what it collects and reports as they see fit.

jsg said: Python has become a seriously big fat snake. Don't get me wrong, I do like Python, but it's by far too fat for what I had in mind.

I could of written it in Ruby or PHP I'm sure, but just did it in Python because it's what I write all my day-to-day scripting in. Most distros already carry Python by default, making the dependency part on me to manage.

jsg said: We didn't want a solution that requires multi MB memory and pulls in many and tens of MB of dependencies and consumes lots of resources.

So you've benchmarked and analyzed the usage of this before shooting your mouth off, correct?

Are you running this on a 64MB RAM and 512MB storage VPS? We live in an age that computers are so capable that optimizations may only shave milliseconds or MB of space. Humans can't tell the difference of milliseconds, nor do MB of storage really matter when we have TB's of space available.

This script was written in an Ubuntu 18.04 LXC container, where everything except for fan speed reporting worked.

jsg said: We didn't want bloat and mindless throwing libraries, dependencies, and resources at the problem and even becoming part of the problem - we wanted a small, single (or max a couple of) file tool that needs no installation or package manager, has a small footprint in every sense, and does its job well on even a very modest VM.

Please do enlighten me on how you would achieve cross platform compatibility without reinventing the wheel here.

jsg · February 2019

@techhelper1 said:
Please do enlighten me on how you would achieve cross platform compatibility without reinventing the wheel here.

No, "dude". It seems to me that we both laid out our views (and capabilities) clearly enough. But to be generous I'll offer an example:

I could of written it in Ruby or PHP I'm sure, but just did it in Python because it's what I write all my day-to-day scripting in.

And btw, no, I think you couldn't unless there was something like psutil available.

Have a nice day.

datanoise · February 2019

jsg said: Btw, your chance to get a significant problem with software from me is way lower than with software from Debian or FreeBSD, because I do put quite some effort into what I - unlike most package maintainers at distros - consider a high priority: safety and security. That start with a well based choice of language and doesn't end with extensive testing, debugging, verifying

I have no doubt you are a great dev, but you are also a random stranger from the internet. That's why (we are not your customer, who knows/trusts you) - but not only for this reason, as you know - that open source would be better for some of us, who might have an interest to look at your code.

Some persons might prefer a script, less efficient (we didn't benchmark your software, though) but easier to understand and... well... that you actually try to understand! (That's probably the main problem with closed source software: if you are curious to look at the code you can't!)

jsg · February 2019

@datanoise said:
I have no doubt you are a great dev, but you are also a random stranger from the internet. That's why (we are not your customer, who knows/trusts you) - but not only for this reason, as you know - that open source would be better for some of us, who might have an interest to look at your code.

Some persons might prefer a script, less efficient (we didn't benchmark your software, though) but easier to understand and... well... that you actually try to understand! (That's probably the main problem with closed source software: if you are curious to look at the code you can't!)

https://medium.com/@shnatsel/how-rusts-standard-library-was-vulnerable-for-years-and-nobody-noticed-aebf0503c3d6

And there we are talking about Rust devs who certainly aren't idiots. What are the chances that an average open source fan discovered (let alone fixed) those problems? And WHERE ARE THE 1000 EYES?

But wait, it gets even more interesting. Now, they started a kind of "seriously. Let's identify and hopefully fix the flaws and bugs!" initiative. How do they try to do that? They use QuickCheck, which at least is kind of a smart fuzzer (as opposed to the majority of "lottery" fuzzers).

Is a fuzzer an adequate tool for that task? No, it is not. But if used properly it can at least find some of the more obvious weak spots. Well, that's at least a beginning, so let's commend the Rust people anyway.

You know who has the best code? The military maybe? Nope. The finance sector. Reason: They invest heavily. Probably because no matter the cost it's still by far less than what they risk to loose with faulty software. The point I'm after is this: There are things you can only do when you invest and get serious professionals working for you.

Now, well understood, I do NOT preach that closed software is the right way. I myself use quite some open source and I value open source a lot (and have given away quite some code myself).

What I do preach though is that preaching "only open source is good/accptable/ the right way" is nonsensical and unrealistic.

One reason for that is that becoming the sort of developer who is capable to produce safe and reliable code takes pretty much the same it takes to become a really good and experienced architect or doctor or lawyer - and all of those very rarely work for free.

Is that bad? Let me point at a quite basic and obvious fact: Unlike in phantasy land the vast majority of people in the real world are driven by what one could call the "effort-reward" ratio. You study harder, you make more efforts ... and you'll end up in a better place, earn more money, have higher status, etc. - take that away and you end up with mediocrity and crap, which btw. describes the situation with software quite well. And I'll very generously leave aside the important issue of responsibility in open source ...

Sorry but reality is a bit more complicated than "open source good, closed source bad". And the reason for that is almost always OURSELVES, us humans.

As some here sem to like fighting for "the right cause", let me suggest you a by far more reasonable and important hunt: How about fighting for universities (those paid by the public) and other public institutions open sourcing all software?

Think about it: universities are about the single most important player in terms of high end tools (besides a few high end industry players). Pretty much everything in crypto, software safety, and many other critical areas comes from universities or at least has some serious university involvement.
And what happens more often than not? As soon as a project is mature enough to have real-world value the involved uni people create a company and sell the result of the work that all of us have paid for. Now, that's what I call nasty and worth to fight against. They rip us off! Simple as that.

vimalware · February 2019

And then there's software for the space shuttle.

datanoise · February 2019

jsg said: As some here sem to like fighting for "the right cause", let me suggest you a by far more reasonable and important hunt: How about fighting for universities (those paid by the public) and other public institutions open sourcing all software?

Nobody pretends to fight for "the right cause" here. You asked an advice you got some answers. There is in fact no debate about "every software" or "the software of X sector" or whatever: it's just about a small server monitoring tool.

That being said, you write really interesting stuff, and pretty well, it's a pleasure to read - reminds me that this forum misses a couple of great minds who used to be around, like @bsdguy ! By the way I agree with you:

jsg said: reality is a bit more complicated than "open source good, closed source bad"

jsg · February 2019

For those who are interested:

The client side sensor is finished and tested. It's size is a bit over 70 KB, so it's really small. For comparison: htop and atop are over 200 KB and even top is more than double the size.

The reason for updating the thread though is another. Client asked for a human readable local (on the client) text output version and it's finished now. Here is an example of a VPS with cherokee (a http server), PHP-FPM and a mysql server running.

The tool is "tick" based. A "tick" is the smallest time span between measurements. In the example the "tick" is 10 sec, so the most frequent measurement possible with that configuration is every 10 sec.

As the tool includes RRD like functionality, another thing must be introduced quickly: granularity. The sensor (monitored client side) there are two granularities, g0 and g1. g0 describes how many ticks a cycle has (the software runs endless cycles). The g0 is a rather typical 1 minute. Once a cycle is completed the sensor writes out what data were gathered during that g0 cycle.
Once a (longer ~ "less granular") g1 cycle is completed, some calculations (see below) are done and the "summary" of that cycle is written out (and/or pushed to the server).

Note: the comments (lines starting with '#' are mine for explanation):

# -- time stamp --
# T timestamp, g0-Tick(sec), g1-cycle
T190226002628,10,90

Each entry of a block begins with a 1 letter id. Each g1 block begins with some time related info: 'T' as id, a timestamp in YYMMDDHHmmss format (HH meaning 24 hours format) in UTC to avoid misunderstandings or even errors if a server collects data from clients in different time zones and to avoid summer/winter time related problems. Next is the "tick" (in sec.) and then the g1 cycle (in ticks. In the example 90 ticks are 15 min)

# -- proc --
# Ppname pid ppid pgid min_rss max_rss avg_rss state
Pmysqld,54908,8176,8176,427312,427312,427312,G
Pphp-fpm,21246,1,21246,10704,10704,10704,G
Pcherokee,68007,1,68007,45884,45944,45885,G
Psshd,57586,1,57586,1720,1720,1720,G

A proc entry begins with a 'P' and the process name, followed by it pid, parent pid, and pgid, followed minimum rss, maximum rss, and avg rss during the cycle. The 3rd line for (cherokee) shows an example for slightly different RSS during those 15 min. Also it serves well to recognize processes gone amok. In the above case the avg is very close to the minimum which suggests that the maximum was just intermediate and short.
Finally a letter ('G' for "good") shows the state of the process. If a process was in a not healthy state even once during the g1 cycle, that is clearly indicated, so one doesn't risk to overlook problems.

# -- mem --
MR44924928
# -- load --
# "L" avg, min, max load # during cycle
L0.168,0.062,0.538

The memory measurement, starting with 'M' shows the free memory. Free as in "available for allocation". For systems with up to 4 GB memoty (total) 'M' is followed by 'R' which indicates that the following number is "free memory in bytes", else it's followed by 'K' (as in 'kilo') and the number is in KB

Load entries are what the comment suggests. It's notworthy though that the resolution is much better than that of top. Also, the avg. value is based on real calculation with that better resolution.

#-- disk -- per mounted volume
# D name size free readBytes writtenBytes
Dvtbd0p2,49587,27547,22,1012

Disk information is looking at mounted volumes only (so not stuff like /dev as "df -h" does) and starts with 'D' followed by the device name (like "sdb1") -or- the mount path for non-disk (pseudo)devices like a tempfs based /tmp (whose entry would start with "D/tmp", followed by size, free size, and read and written bytes (during that g1 cycle)

# -- net --  per interface
#"N"iname iErrors oErrors iPackets oPackets iBytes oBytes state
Nem0,0,0,204565,183175,13671609,67718264,G
Nlo0,0,0,35923,35923,1436920,1436920,-

Network interface entries start with 'N' and the interface name, followed by in and out error, packets, and bytes read or written, followed by a state indicator (like with procs) which is only 'G' (good) if the state was OK in all g0 probes.

The config for the example shown is: mem/load: 3 (every 30 sec), disks: 30 (every 5 min), net: 6 (every minute), and procs: 12 (every 2 minutes).

The load created by the monitor software itself is hardly measurable. The g0 entries are written revolving and occupy no more than about 3 KB disk space and the g1 entries (if kept locally) are about 30 - 35 KB per day (gz compressible to about 1/10). And btw, if kept locally the g1 log auto-rotates every day shortly after 0:00 UTC (to be precise after the last g1 cycle that started before midnight is finished.

Let me know if you have interesting suggestions that are within the programs scope.

eol · February 2019

What about current bandwidth/disk utilization, cpu temperature, smart values, etc.?

jsg · February 2019

@eol said:
What about current bandwidth/disk utilization, cpu temperature, smart values, etc.?

bandwith is simply a calculation (done on the server) and disk util. is available (see above).
Processor temp and smart values (and plenty more) might be interesting though. Probably I'll add some of that. Keep in mind though that the tool is not meant (like so many) to get every bit of information but rather to get an overview in particular with regard to "is the system OK?".

eol · February 2019

@jsg said:

@eol said:
What about current bandwidth/disk utilization, cpu temperature, smart values, etc.?

bandwith is simply a calculation (done on the server) and disk util. is available (see above).
Processor temp and smart values (and plenty more) might be interesting though. Probably I'll add some of that. Keep in mind though that the tool is not meant (like so many) to get every bit of information but rather to get an overview in particular with regard to "is the system OK?".

Sure.
Therefore CPU temp. and smart values should be included imho.

Howdy, Stranger!

Categories

In this Discussion

New server monitoring tool, feedback desired.

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

New server monitoring tool, feedback desired.

Comments