2016-10-07 13:39:01 +00:00
|
|
|
Presentation
|
|
|
|
============
|
|
|
|
|
|
|
|
reed-alert is a tool to check the status of various things on a server
|
|
|
|
and trigger user defined notifications to be alerted. In the code,
|
|
|
|
each check is called a "probe" and have parameters.
|
|
|
|
|
|
|
|
The code is very rough for now. I will try to make the config file
|
|
|
|
easier than it is actually, but I think it's already easy enough for
|
|
|
|
people who need to kind of tool.
|
|
|
|
|
2016-10-07 13:49:52 +00:00
|
|
|
I try to avoid usage of external libraries so the deployment is easy
|
|
|
|
as it only requires a Common LISP interpreter and a few files.
|
|
|
|
|
2016-10-07 13:39:01 +00:00
|
|
|
reed-alert is regularly tested on FreeBSD/OpenBSD/Linux
|
|
|
|
|
2016-10-07 13:49:52 +00:00
|
|
|
How to use
|
|
|
|
==========
|
|
|
|
|
|
|
|
It has been tested with both **sbcl** and **ecl** which should be
|
|
|
|
available in most distribution people use.
|
|
|
|
|
|
|
|
To start reed-alert
|
|
|
|
|
|
|
|
+ sbcl : **sbcl --script config_file.lisp**
|
|
|
|
+ ecl : **ecl -shell config_file.lisp**
|
|
|
|
|
2016-10-07 13:39:01 +00:00
|
|
|
|
|
|
|
Defining notification system
|
|
|
|
============================
|
|
|
|
|
|
|
|
+ function : the name of the probe
|
|
|
|
+ date : the current date with format YYYY/MM/DD hh:mm:ss
|
|
|
|
+ params : the parameters of the probe
|
|
|
|
+ hostname : the hostname of the server
|
|
|
|
+ result : the error returned (the value exceeding the limit, file not found)
|
|
|
|
+ description : an arbitrary description naming a check
|
|
|
|
+ level : the type of notification used
|
|
|
|
+ os : the type of operating system (FreeBSD/Linux/OpenBSD)
|
|
|
|
+ _ : a space character
|
|
|
|
+ space : a space character
|
|
|
|
+ newline : a newline character
|
|
|
|
|
|
|
|
If you want to send a mail with a message like "At 2016/10/06 11:11:12
|
|
|
|
server.foo.com has encountered a problem during LOAD-AVERAGE-15
|
|
|
|
(:LIMIT 10) with a value of 30" you can write the following and use
|
|
|
|
**pretty-mail** in your checks.
|
|
|
|
|
|
|
|
(defvar *alerts*
|
|
|
|
(list
|
|
|
|
'(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function
|
|
|
|
params " with a value of " result "' | mail yourmail@foo.bar"))))
|
|
|
|
|
|
|
|
If you don't want anything to be triggered, you can use the following
|
|
|
|
in *alerts*
|
|
|
|
|
|
|
|
'(nothing-to-send nil)
|
|
|
|
|
|
|
|
If you find it easier to read, you can add + in the concatenation,
|
|
|
|
this is simply discarded when the program parse the list.
|
|
|
|
|
|
|
|
'(pretty-mail (date + " " + hostname + " has encountered a problem " + function))
|
|
|
|
|
|
|
|
The differents probes
|
|
|
|
=====================
|
|
|
|
|
|
|
|
Probes are written in LISP and sometimes relies on system call, like
|
|
|
|
for ping or the average load of the system. It cares about running on
|
|
|
|
different operating system.
|
|
|
|
|
|
|
|
The following parameter is allowed for every probes. It allows you to
|
|
|
|
describe what the check do / concern to put it in the notification if you want
|
|
|
|
:desc "STRING"
|
|
|
|
|
|
|
|
number-of-processes
|
|
|
|
-------------------
|
|
|
|
Check if the actual number of processes of the system exceed the limit
|
|
|
|
|
|
|
|
> Set the limit that will trigger an alert when exceeded
|
|
|
|
:limit INTEGER
|
|
|
|
|
|
|
|
Example : `(=> example number-of-processes (:limit 200))`
|
|
|
|
|
|
|
|
pid-running
|
|
|
|
-----------
|
|
|
|
Check if the PID number found in a .pid file is alive
|
|
|
|
|
|
|
|
> Set the path of the pid file. If user don't have permission to open it, return "file not found"
|
|
|
|
:path "STRING"
|
|
|
|
|
|
|
|
Example : `(=> example pid-running (:path "/var/run/nginx.pid"))`
|
|
|
|
|
|
|
|
|
|
|
|
disk-usage
|
|
|
|
----------
|
|
|
|
Check if the used percent of the choosed partition exceed the limit
|
|
|
|
|
|
|
|
> Set the mountpoint to check
|
|
|
|
:path "STRING"
|
|
|
|
|
|
|
|
> Set the limit that will trigger an alert when exceeded
|
|
|
|
:limit INTEGER
|
|
|
|
|
|
|
|
Example : `(=> example disk-usage (:path "/tmp" :limit 50))`
|
|
|
|
|
|
|
|
|
|
|
|
file-exists
|
|
|
|
-----------
|
|
|
|
Check if a file exists
|
|
|
|
|
|
|
|
> Set the path of the file to check
|
|
|
|
:path "STRING"
|
|
|
|
|
|
|
|
Example : `(=> example file-exists (:path "/var/postgresql/standby"))`
|
|
|
|
|
|
|
|
file-updated
|
|
|
|
------------
|
|
|
|
Check if a file exists and has been updated since a defined time
|
|
|
|
|
|
|
|
> Set the path of the file to check
|
|
|
|
:path "STRING"
|
|
|
|
|
|
|
|
> Set the limit in minutes since the last modification time before triggering an alert
|
|
|
|
:limit INTEGER
|
|
|
|
|
|
|
|
Example : `(=> example file-updated (:path "/var/log/nginx/access.log" :limit 60))`
|
|
|
|
|
|
|
|
load-average-1
|
|
|
|
--------------
|
|
|
|
Check if the load average on the last minute exceed the limit
|
|
|
|
|
|
|
|
> Set the limit not to exceed
|
|
|
|
:limit INTEGER
|
|
|
|
|
|
|
|
Example : `(=> example load-average-1 (:limit 2))`
|
|
|
|
|
|
|
|
load-average-5
|
|
|
|
--------------
|
|
|
|
Check if the load average on the last fives minutes exceed the limit
|
|
|
|
|
|
|
|
> Set the limit not to exceed
|
|
|
|
:limit INTEGER
|
|
|
|
|
|
|
|
Example : `(=> example load-average-5 (:limit 2))`
|
|
|
|
|
|
|
|
load-average-15
|
|
|
|
---------------
|
|
|
|
Check if the load average on the last fifteen minutes exceed the limit
|
|
|
|
|
|
|
|
> Set the limit not to exceed
|
|
|
|
:limit INTEGER
|
|
|
|
|
|
|
|
Example : `(=> example load-average-15 (:limit 2))`
|
|
|
|
|
|
|
|
ping
|
|
|
|
----
|
|
|
|
Check if a remote host answer the 2 ICMP ping
|
|
|
|
|
|
|
|
> Set the host to ping. Return an error if ping command returns non-zero
|
|
|
|
:host "STRING" (can be IP or hostname)
|
|
|
|
|
|
|
|
Example : `(=> example ping (:host "8.8.8.8"))`
|
|
|
|
|
|
|
|
command
|
|
|
|
-------
|
|
|
|
Execute an arbitrary command which trigger an alert if the command return a non-zero value
|
|
|
|
|
|
|
|
> Command to execute, accept commands with pipes
|
|
|
|
:command "STRING"
|
|
|
|
|
|
|
|
Example : `(=> example command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))`
|