reed-alert/README

259 lines
6.2 KiB
Plaintext
Raw Normal View History

Description
===========
reed-alert is a small and simple monitoring tool for your server,
written in Common LISP.
reed-alert checks the status of various processes on a server and
triggers self defined notifications.
Each triggered message is called an 'alert'.
Each check is called a 'probe'.
Each probe can be customized by different parameters.
Dependencies
============
reed-alert is regularly tested on FreeBSD/OpenBSD/Linux and has been
tested with both **sbcl** and **ecl** - which should be available for
most distributions.
(On OpenBSD you may prefer to use ecl because sbcl needs 'wxallowed'
where the binary is.)
To make reed-alert's deployment easier I avoid using external
libraries. reed-alert only requires a Common LISP interpreter and a
few files.
2016-10-07 13:49:52 +00:00
Code-Readability
================
2016-10-07 13:49:52 +00:00
Although the code is very rough for now, I think it's already fairly
understandable by people who do need this kind of tool.
2016-10-07 13:49:52 +00:00
I will try to improve on the readability of the config file in future
commits.
Usage
=====
Start reed-alert
----------------
2016-10-07 13:49:52 +00:00
To start reed-alert
+ sbcl : **sbcl --script config_file.lisp**
2017-11-16 09:20:01 +00:00
+ ecl : **ecl -shell config_file.lisp**
2016-10-07 13:49:52 +00:00
Personal Configuration File
---------------------------
You may want to rename **config.lisp.sample** to **config.lisp** in
order to create your own configuration file.
2016-10-07 13:56:58 +00:00
The configuration is explained below.
The Notification System
=======================
+ function : the name of the probe
+ date : the current date with format YYYY/MM/DD hh:mm:ss
+ params : the parameters of the probe
+ hostname : the hostname of the server
+ result : the error returned (the value exceeding the limit, file not found)
+ description : an arbitrary description naming a check
+ level : the type of notification used
+ os : the type of operating system (FreeBSD/Linux/OpenBSD)
+ _ : a space character
+ space : a space character
+ newline : a newline character
Example Probe: 'Check For Load Average'
---------------------------------------
If you want to send a mail with a message like:
"At 2016/10/06 11:11:12 server.foo.com has encountered a problem
during LOAD-AVERAGE-15 (:LIMIT 10) with a value of 30"
write the following and use **pretty-mail** in your checks:
(defvar *alerts*
(list
'(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function
params " with a value of " result "' | mail yourmail@foo.bar"))))
Variant 1
~~~~~~~~~
If you find it easier to read, you can add + in the concatenation.
The + is discarded by reed-alert as soon as it parses the list.
'(pretty-mail (date + " " + hostname + " has encountered a problem " + function))
Variant 2
~~~~~~~~~
If you don't want anything to be triggered use the following in *alerts*:
'(nothing-to-send nil)
The Probes
==========
Probes are written in Common LISP.
The :desc Parameter
-------------------
The :desc parameter allows you to describe specifically what your check
does. It can be put in every probe.
:desc "STRING"
Overview
--------
As of this commit, reed-alert ships with the following probes:
(1) number-of-processes
(2) pid-running
(3) disk-usage
(4) file-exists
(5) file-updated
(6) load-average-1
(7) load-average-5
(8) load-average-15
(9) ping
(10) command
2017-11-16 09:20:01 +00:00
(11) service
(12) file-less-than
number-of-processes
-------------------
Check if the actual number of processes of the system exceeds a specific limit.
> Set the limit that will trigger an alert when exceeded.
:limit INTEGER
Example : `(=> alert number-of-processes (:limit 200))`
pid-running
-----------
Check if the PID number found in a .pid file is alive.
> Set the path of the pid file. If $USER doesn't have permission to open it, return "file not found".
:path "STRING"
Example : `(=> alert pid-running (:path "/var/run/nginx.pid"))`
disk-usage
----------
Check if the disk-usage of a chosen partition does exceed a specific limit.
> Set the mountpoint to check.
:path "STRING"
2017-11-16 09:20:01 +00:00
> Set the limit that will trigger an alert when exceeded.
:limit INTEGER
2017-11-16 09:20:01 +00:00
Example : `(=> alert disk-usage (:path "/tmp" :limit 50))`
file-exists
-----------
Check if a file exists.
> Set the path of the file to check.
:path "STRING"
Example : `(=> alert file-exists (:path "/var/postgresql/standby"))`
file-updated
------------
Check if a file exists and has been updated since a defined time.
> Set the path of the file to check.
:path "STRING"
2017-11-16 09:20:01 +00:00
> Set the limit in minutes since the last modification time before triggering an alert.
:limit INTEGER
2017-11-16 09:20:01 +00:00
Example : `(=> alert file-updated (:path "/var/log/nginx/access.log" :limit 60))`
load-average-1
--------------
Check if the load average during the last minute exceeds a specific limit.
> Set the limit not to exceed.
:limit INTEGER
Example : `(=> alert load-average-1 (:limit 2))`
load-average-5
--------------
Check if the load average during the last five minutes exceeds a specific limit.
> Set the limit not to exceed.
:limit INTEGER
Example : `(=> alert load-average-5 (:limit 2))`
load-average-15
---------------
Check if the load average during the last fifteen minutes exceeds a specific limit.
> Set the limit not to exceed.
:limit INTEGER
Example : `(=> alert load-average-15 (:limit 2))`
ping
----
Check if a remote host answers the 2 ICMP ping.
> Set the host to ping. Return an error if ping command returns non-zero.
:host "STRING" (can be IP or hostname)
2017-11-16 09:20:01 +00:00
Example : `(=> alert ping (:host "8.8.8.8"))`
command
-------
Execute an arbitrary command which triggers an alert if it returns a non-zero value.
> Command to execute, accept commands with pipes.
:command "STRING"
Example : `(=> alert command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))`
service
-------
Check if a service is started on the system.
> Set the name of the service to test
:name STRING
Example : `(=> alert service (:name "mysql-server"))`
file-less-than
--------------
Check if a file has a size less than a specified limit.
> Set the path of the file to check.
:path "STRING"
2017-11-16 09:20:01 +00:00
> Set the limit in bytes before triggering an alert.
:limit INTEGER
2017-11-16 09:20:01 +00:00
Example : `(=> alert file-less-than (:path "/var/log/nginx/access.log" :limit 60))`