reed-alert/README.md

172 lines
4.9 KiB
Markdown
Raw Normal View History

Presentation
============
reed-alert is a tool to check the status of various things on a server
and trigger user defined notifications to be alerted. In the code,
each check is called a "probe" and have parameters.
The code is very rough for now. I will try to make the config file
easier than it is actually, but I think it's already easy enough for
people who need to kind of tool.
2016-10-07 13:49:52 +00:00
I try to avoid usage of external libraries so the deployment is easy
as it only requires a Common LISP interpreter and a few files.
reed-alert is regularly tested on FreeBSD/OpenBSD/Linux
2016-10-07 13:49:52 +00:00
How to use
==========
It has been tested with both **sbcl** and **ecl** which should be
available in most distribution people use.
To start reed-alert
+ sbcl : **sbcl --script config_file.lisp**
+ ecl : **ecl -shell config_file.lisp**
Defining notification system
============================
+ function : the name of the probe
+ date : the current date with format YYYY/MM/DD hh:mm:ss
+ params : the parameters of the probe
+ hostname : the hostname of the server
+ result : the error returned (the value exceeding the limit, file not found)
+ description : an arbitrary description naming a check
+ level : the type of notification used
+ os : the type of operating system (FreeBSD/Linux/OpenBSD)
+ _ : a space character
+ space : a space character
+ newline : a newline character
If you want to send a mail with a message like "At 2016/10/06 11:11:12
server.foo.com has encountered a problem during LOAD-AVERAGE-15
(:LIMIT 10) with a value of 30" you can write the following and use
**pretty-mail** in your checks.
(defvar *alerts*
(list
'(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function
params " with a value of " result "' | mail yourmail@foo.bar"))))
If you don't want anything to be triggered, you can use the following
in *alerts*
'(nothing-to-send nil)
If you find it easier to read, you can add + in the concatenation,
this is simply discarded when the program parse the list.
'(pretty-mail (date + " " + hostname + " has encountered a problem " + function))
The differents probes
=====================
Probes are written in LISP and sometimes relies on system call, like
for ping or the average load of the system. It cares about running on
different operating system.
The following parameter is allowed for every probes. It allows you to
describe what the check do / concern to put it in the notification if you want
:desc "STRING"
number-of-processes
-------------------
Check if the actual number of processes of the system exceed the limit
> Set the limit that will trigger an alert when exceeded
:limit INTEGER
Example : `(=> example number-of-processes (:limit 200))`
pid-running
-----------
Check if the PID number found in a .pid file is alive
> Set the path of the pid file. If user don't have permission to open it, return "file not found"
:path "STRING"
Example : `(=> example pid-running (:path "/var/run/nginx.pid"))`
disk-usage
----------
Check if the used percent of the choosed partition exceed the limit
> Set the mountpoint to check
:path "STRING"
> Set the limit that will trigger an alert when exceeded
:limit INTEGER
Example : `(=> example disk-usage (:path "/tmp" :limit 50))`
file-exists
-----------
Check if a file exists
> Set the path of the file to check
:path "STRING"
Example : `(=> example file-exists (:path "/var/postgresql/standby"))`
file-updated
------------
Check if a file exists and has been updated since a defined time
> Set the path of the file to check
:path "STRING"
> Set the limit in minutes since the last modification time before triggering an alert
:limit INTEGER
Example : `(=> example file-updated (:path "/var/log/nginx/access.log" :limit 60))`
load-average-1
--------------
Check if the load average on the last minute exceed the limit
> Set the limit not to exceed
:limit INTEGER
Example : `(=> example load-average-1 (:limit 2))`
load-average-5
--------------
Check if the load average on the last fives minutes exceed the limit
> Set the limit not to exceed
:limit INTEGER
Example : `(=> example load-average-5 (:limit 2))`
load-average-15
---------------
Check if the load average on the last fifteen minutes exceed the limit
> Set the limit not to exceed
:limit INTEGER
Example : `(=> example load-average-15 (:limit 2))`
ping
----
Check if a remote host answer the 2 ICMP ping
> Set the host to ping. Return an error if ping command returns non-zero
:host "STRING" (can be IP or hostname)
Example : `(=> example ping (:host "8.8.8.8"))`
command
-------
Execute an arbitrary command which trigger an alert if the command return a non-zero value
> Command to execute, accept commands with pipes
:command "STRING"
Example : `(=> example command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))`