New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
CRON service crashed
Today the cron service crashed on my server as the result of a load spike, as a result Observium didn't poll my devices for around 3 hours until I noticed.
How can I prevent something like this happening again in the future, or atleast have some sort of check that will notify me should the service not be running?
Comments
Monit?
Monit!
+1 for Monit
Um, and what if monit process crashes? You can set up some script that confirms cron is running, then have external server check that script (via TCP, HTTP, whatever) at interval and alert if it's not working. Easiest would be web script so you can use anything that supports HTTP check.
A service shouldn't crash just like this, just because there was some load spike on the box. Was there an Out Of Memory situation? What is this - OpenVZ / KVM / dedi?
Dedicated box. SWAP is at 100% but memory sits around 50% used.
Add more swap for such cases.
Adding swap is never a solution.
In most cases, swap shouldn't be enabled at all on servers.
And any reasonable monitoring strategy should ALWAYS include internal/local monitor and external monitors (or at least something to monitor the monitor). Obviously, the OP has additional troubleshooting to do.
Fully agreed. An actual crash should be reported to the developers.