README has been reworked, thanks to lambda from fnord.one. He fixed typos and enhanced explanations.

This commit is contained in:
Solene Rapenne 2017-11-16 10:13:43 +01:00 committed by Solene Rapenne
parent fed9c9d46d
commit 6e9a233215
1 changed files with 152 additions and 69 deletions

221
README
View File

@ -1,37 +1,62 @@
Presentation
Description
===========
reed-alert is a small and simple monitoring tool for your server,
written in Common LISP.
reed-alert checks the status of various processes on a server and
triggers self defined notifications.
Each triggered message is called an 'alert'.
Each check is called a 'probe'.
Each probe can be customized by different parameters.
Dependencies
============
reed-alert is a tool to check the status of various things on a server
and trigger user defined notifications to be alerted. In the code,
each check is called a "probe" and have parameters.
reed-alert is regularly tested on FreeBSD/OpenBSD/Linux and has been
tested with both **sbcl** and **ecl** - which should be available for
most distributions.
The code is very rough for now. I will try to make the config file
easier than it is actually, but I think it's already easy enough for
people who need to kind of tool.
(On OpenBSD you may prefer to use ecl because sbcl needs 'wxallowed'
where the binary is.)
I try to avoid usage of external libraries so the deployment is easy
as it only requires a Common LISP interpreter and a few files.
To make reed-alert's deployment easier I avoid using external
libraries. reed-alert only requires a Common LISP interpreter and a
few files.
reed-alert is regularly tested on FreeBSD/OpenBSD/Linux.
How to use
==========
Code-Readability
================
It has been tested with both **sbcl** and **ecl** which should be
available in most distribution people use. On OpenBSD you may prefer
to use ecl because sbcl needs wxallowed where the binary is.
Although the code is very rough for now, I think it's already fairly
understandable by people who do need this kind of tool.
I will try to improve on the readability of the config file in future
commits.
Usage
=====
Start reed-alert
----------------
To start reed-alert
+ sbcl : **sbcl --script config_file.lisp**
+ ecl : **ecl -shell config_file.lisp**
You can rename **config.lisp.sample** to **config.lisp** to create
your own configuration file. The configuration is explained below.
Personal Configuration File
---------------------------
You may want to rename **config.lisp.sample** to **config.lisp** in
order to create your own configuration file.
The configuration is explained below.
Defining notification system
============================
The Notification System
=======================
+ function : the name of the probe
+ date : the current date with format YYYY/MM/DD hh:mm:ss
@ -45,131 +70,189 @@ Defining notification system
+ space : a space character
+ newline : a newline character
If you want to send a mail with a message like "At 2016/10/06 11:11:12
server.foo.com has encountered a problem during LOAD-AVERAGE-15
(:LIMIT 10) with a value of 30" you can write the following and use
**pretty-mail** in your checks.
Example Probe: 'Check For Load Average'
---------------------------------------
If you want to send a mail with a message like:
"At 2016/10/06 11:11:12 server.foo.com has encountered a problem
during LOAD-AVERAGE-15 (:LIMIT 10) with a value of 30"
write the following and use **pretty-mail** in your checks:
(defvar *alerts*
(list
'(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function
params " with a value of " result "' | mail yourmail@foo.bar"))))
If you don't want anything to be triggered, you can use the following
in *alerts*
'(nothing-to-send nil)
If you find it easier to read, you can add + in the concatenation,
this is simply discarded when the program parse the list.
Variant 1
~~~~~~~~~
If you find it easier to read, you can add + in the concatenation.
The + is discarded by reed-alert as soon as it parses the list.
'(pretty-mail (date + " " + hostname + " has encountered a problem " + function))
The differents probes
=====================
Variant 2
~~~~~~~~~
If you don't want anything to be triggered use the following in *alerts*:
Probes are written in LISP and sometimes relies on system call, like
for ping or the average load of the system. It cares about running on
different operating system.
'(nothing-to-send nil)
The Probes
==========
Probes are written in Common LISP.
The :desc Parameter
-------------------
The :desc parameter allows you to describe specifically what your check
does. It can be put in every probe.
The following parameter is allowed for every probes. It allows you to
describe what the check do / concern to put it in the notification if you want
:desc "STRING"
Overview
--------
As of this commit, reed-alert ships with the following probes:
(1) number-of-processes
(2) pid-running
(3) disk-usage
(4) file-exists
(5) file-updated
(6) load-average-1
(7) load-average-5
(8) load-average-15
(9) ping
(10) command
(11) service
(12) file-less-than
number-of-processes
-------------------
Check if the actual number of processes of the system exceed the limit
Check if the actual number of processes of the system exceeds a specific limit.
> Set the limit that will trigger an alert when exceeded
> Set the limit that will trigger an alert when exceeded.
:limit INTEGER
Example : `(=> example number-of-processes (:limit 200))`
Example : `(=> alert number-of-processes (:limit 200))`
pid-running
-----------
Check if the PID number found in a .pid file is alive
Check if the PID number found in a .pid file is alive.
> Set the path of the pid file. If user don't have permission to open it, return "file not found"
> Set the path of the pid file. If $USER doesn't have permission to open it, return "file not found".
:path "STRING"
Example : `(=> example pid-running (:path "/var/run/nginx.pid"))`
Example : `(=> alert pid-running (:path "/var/run/nginx.pid"))`
disk-usage
----------
Check if the used percent of the choosed partition exceed the limit
Check if the disk-usage of a chosen partition does exceed a specific limit.
> Set the mountpoint to check
> Set the mountpoint to check.
:path "STRING"
> Set the limit that will trigger an alert when exceeded
> Set the limit that will trigger an alert when exceeded.
:limit INTEGER
Example : `(=> example disk-usage (:path "/tmp" :limit 50))`
Example : `(=> alert disk-usage (:path "/tmp" :limit 50))`
file-exists
-----------
Check if a file exists
Check if a file exists.
> Set the path of the file to check
> Set the path of the file to check.
:path "STRING"
Example : `(=> example file-exists (:path "/var/postgresql/standby"))`
Example : `(=> alert file-exists (:path "/var/postgresql/standby"))`
file-updated
------------
Check if a file exists and has been updated since a defined time
Check if a file exists and has been updated since a defined time.
> Set the path of the file to check
> Set the path of the file to check.
:path "STRING"
> Set the limit in minutes since the last modification time before triggering an alert
> Set the limit in minutes since the last modification time before triggering an alert.
:limit INTEGER
Example : `(=> example file-updated (:path "/var/log/nginx/access.log" :limit 60))`
Example : `(=> alert file-updated (:path "/var/log/nginx/access.log" :limit 60))`
load-average-1
--------------
Check if the load average on the last minute exceed the limit
Check if the load average during the last minute exceeds a specific limit.
> Set the limit not to exceed
> Set the limit not to exceed.
:limit INTEGER
Example : `(=> example load-average-1 (:limit 2))`
Example : `(=> alert load-average-1 (:limit 2))`
load-average-5
--------------
Check if the load average on the last fives minutes exceed the limit
Check if the load average during the last five minutes exceeds a specific limit.
> Set the limit not to exceed
> Set the limit not to exceed.
:limit INTEGER
Example : `(=> example load-average-5 (:limit 2))`
Example : `(=> alert load-average-5 (:limit 2))`
load-average-15
---------------
Check if the load average on the last fifteen minutes exceed the limit
Check if the load average during the last fifteen minutes exceeds a specific limit.
> Set the limit not to exceed
> Set the limit not to exceed.
:limit INTEGER
Example : `(=> example load-average-15 (:limit 2))`
Example : `(=> alert load-average-15 (:limit 2))`
ping
----
Check if a remote host answer the 2 ICMP ping
Check if a remote host answers the 2 ICMP ping.
> Set the host to ping. Return an error if ping command returns non-zero
> Set the host to ping. Return an error if ping command returns non-zero.
:host "STRING" (can be IP or hostname)
Example : `(=> example ping (:host "8.8.8.8"))`
Example : `(=> alert ping (:host "8.8.8.8"))`
command
-------
Execute an arbitrary command which trigger an alert if the command return a non-zero value
Execute an arbitrary command which triggers an alert if it returns a non-zero value.
> Command to execute, accept commands with pipes
> Command to execute, accept commands with pipes.
:command "STRING"
Example : `(=> example command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))`
Example : `(=> alert command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))`
service
-------
Check if a service is started on the system.
> Set the name of the service to test
:name STRING
Example : `(=> alert service (:name "mysql-server"))`
file-less-than
--------------
Check if a file has a size less than a specified limit.
> Set the path of the file to check.
:path "STRING"
> Set the limit in bytes before triggering an alert.
:limit INTEGER
Example : `(=> alert file-less-than (:path "/var/log/nginx/access.log" :limit 60))`