Simple unix alerting system
Go to file
solene rapenne c3f594da02 Fix description to desc variable in alerts 2016-10-10 20:35:24 +02:00
LICENSE add license 2016-10-07 15:38:44 +02:00
README [DOC] OpenBSD, sbcl and wxneeded 2016-10-07 16:03:05 +02:00
config.lisp.sample Fix description to desc variable in alerts 2016-10-10 20:35:24 +02:00
example.lisp Fix description to desc variable in alerts 2016-10-10 20:35:24 +02:00
functions.lisp Load asdf 2016-10-07 15:46:31 +02:00
probes.lisp init 2016-10-07 12:25:49 +02:00

README

Presentation
============

reed-alert is a tool to check the status of various things on a server
and trigger user defined notifications to be alerted. In the code,
each check is called a "probe" and have parameters.

The code is very rough for now. I will try to make the config file
easier than it is actually, but I think it's already easy enough for
people who need to kind of tool.

I try to avoid usage of external libraries so the deployment is easy
as it only requires a Common LISP interpreter and a few files.

reed-alert is regularly tested on FreeBSD/OpenBSD/Linux.

How to use
==========

It has been tested with both **sbcl** and **ecl** which should be
available in most distribution people use. On OpenBSD you may prefer
to use ecl because sbcl needs wxallowed where the binary is.

To start reed-alert

+ sbcl : **sbcl --script config_file.lisp**
+ ecl  : **ecl -shell config_file.lisp**	

You can rename **config.lisp.sample** to **config.lisp** to create
your own configuration file. The configuration is explained below.


Defining notification system
============================

+ function    : the name of the probe
+ date        : the current date with format YYYY/MM/DD hh:mm:ss
+ params      : the parameters of the probe
+ hostname    : the hostname of the server
+ result      : the error returned (the value exceeding the limit, file not found)
+ description : an arbitrary description naming a check
+ level       : the type of notification used
+ os          : the type of operating system (FreeBSD/Linux/OpenBSD)
+ _           : a space character
+ space       : a space character
+ newline     : a newline character

If you want to send a mail with a message like "At 2016/10/06 11:11:12
server.foo.com has encountered a problem during LOAD-AVERAGE-15
(:LIMIT 10) with a value of 30" you can write the following and use
**pretty-mail** in your checks.

   (defvar *alerts*
     (list
      '(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function 
	                 params " with a value of " result "' | mail yourmail@foo.bar"))))
					 
If you don't want anything to be triggered, you can use the following
in *alerts*

    '(nothing-to-send nil)
	
If you find it easier to read, you can add + in the concatenation,
this is simply discarded when the program parse the list.

    '(pretty-mail (date + " " + hostname + " has encountered a problem " + function))

The differents probes
=====================

Probes are written in LISP and sometimes relies on system call, like
for ping or the average load of the system. It cares about running on
different operating system.

The following parameter is allowed for every probes. It allows you to
describe what the check do / concern to put it in the notification if you want
    :desc "STRING"

number-of-processes
-------------------
Check if the actual number of processes of the system exceed the limit

> Set the limit that will trigger an alert when exceeded
    :limit INTEGER

Example : `(=> example number-of-processes (:limit 200))`

pid-running
-----------
Check if the PID number found in a .pid file is alive

> Set the path of the pid file. If user don't have permission to open it, return "file not found"
    :path "STRING"

Example : `(=> example pid-running (:path "/var/run/nginx.pid"))`


disk-usage
----------
Check if the used percent of the choosed partition exceed the limit

> Set the mountpoint to check
    :path "STRING"
	
> Set the limit that will trigger an alert when exceeded
    :limit INTEGER
	
Example : `(=> example disk-usage (:path "/tmp" :limit 50))`


file-exists
-----------
Check if a file exists

> Set the path of the file to check
    :path "STRING"

Example : `(=> example file-exists (:path "/var/postgresql/standby"))`

file-updated
------------
Check if a file exists and has been updated since a defined time

> Set the path of the file to check
    :path "STRING"
	
> Set the limit in minutes since the last modification time before triggering an alert
    :limit INTEGER
	
Example : `(=> example file-updated (:path "/var/log/nginx/access.log" :limit 60))`

load-average-1
--------------
Check if the load average on the last minute exceed the limit

> Set the limit not to exceed
    :limit INTEGER

Example : `(=> example load-average-1 (:limit 2))`

load-average-5
--------------
Check if the load average on the last fives minutes exceed the limit

> Set the limit not to exceed
    :limit INTEGER

Example : `(=> example load-average-5 (:limit 2))`

load-average-15
---------------
Check if the load average on the last fifteen minutes exceed the limit

> Set the limit not to exceed
    :limit INTEGER

Example : `(=> example load-average-15 (:limit 2))`

ping
----
Check if a remote host answer the 2 ICMP ping

> Set the host to ping. Return an error if ping command returns non-zero
    :host "STRING" (can be IP or hostname)
	
Example : `(=> example ping (:host "8.8.8.8"))`

command
-------
Execute an arbitrary command which trigger an alert if the command return a non-zero value

> Command to execute, accept commands with pipes
    :command "STRING"

Example : `(=> example command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))`