mirror of git://bitreich.org/reed-alert
README has been reworked, thanks to lambda from fnord.one. He fixed typos and enhanced explanations.
This commit is contained in:
parent
fed9c9d46d
commit
6e9a233215
221
README
221
README
|
@ -1,37 +1,62 @@
|
|||
Presentation
|
||||
Description
|
||||
===========
|
||||
|
||||
reed-alert is a small and simple monitoring tool for your server,
|
||||
written in Common LISP.
|
||||
|
||||
reed-alert checks the status of various processes on a server and
|
||||
triggers self defined notifications.
|
||||
|
||||
Each triggered message is called an 'alert'.
|
||||
Each check is called a 'probe'.
|
||||
Each probe can be customized by different parameters.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
reed-alert is a tool to check the status of various things on a server
|
||||
and trigger user defined notifications to be alerted. In the code,
|
||||
each check is called a "probe" and have parameters.
|
||||
reed-alert is regularly tested on FreeBSD/OpenBSD/Linux and has been
|
||||
tested with both **sbcl** and **ecl** - which should be available for
|
||||
most distributions.
|
||||
|
||||
The code is very rough for now. I will try to make the config file
|
||||
easier than it is actually, but I think it's already easy enough for
|
||||
people who need to kind of tool.
|
||||
(On OpenBSD you may prefer to use ecl because sbcl needs 'wxallowed'
|
||||
where the binary is.)
|
||||
|
||||
I try to avoid usage of external libraries so the deployment is easy
|
||||
as it only requires a Common LISP interpreter and a few files.
|
||||
To make reed-alert's deployment easier I avoid using external
|
||||
libraries. reed-alert only requires a Common LISP interpreter and a
|
||||
few files.
|
||||
|
||||
reed-alert is regularly tested on FreeBSD/OpenBSD/Linux.
|
||||
|
||||
How to use
|
||||
==========
|
||||
Code-Readability
|
||||
================
|
||||
|
||||
It has been tested with both **sbcl** and **ecl** which should be
|
||||
available in most distribution people use. On OpenBSD you may prefer
|
||||
to use ecl because sbcl needs wxallowed where the binary is.
|
||||
Although the code is very rough for now, I think it's already fairly
|
||||
understandable by people who do need this kind of tool.
|
||||
|
||||
I will try to improve on the readability of the config file in future
|
||||
commits.
|
||||
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
Start reed-alert
|
||||
----------------
|
||||
To start reed-alert
|
||||
|
||||
+ sbcl : **sbcl --script config_file.lisp**
|
||||
+ ecl : **ecl -shell config_file.lisp**
|
||||
|
||||
You can rename **config.lisp.sample** to **config.lisp** to create
|
||||
your own configuration file. The configuration is explained below.
|
||||
Personal Configuration File
|
||||
---------------------------
|
||||
You may want to rename **config.lisp.sample** to **config.lisp** in
|
||||
order to create your own configuration file.
|
||||
|
||||
The configuration is explained below.
|
||||
|
||||
|
||||
Defining notification system
|
||||
============================
|
||||
The Notification System
|
||||
=======================
|
||||
|
||||
+ function : the name of the probe
|
||||
+ date : the current date with format YYYY/MM/DD hh:mm:ss
|
||||
|
@ -45,131 +70,189 @@ Defining notification system
|
|||
+ space : a space character
|
||||
+ newline : a newline character
|
||||
|
||||
If you want to send a mail with a message like "At 2016/10/06 11:11:12
|
||||
server.foo.com has encountered a problem during LOAD-AVERAGE-15
|
||||
(:LIMIT 10) with a value of 30" you can write the following and use
|
||||
**pretty-mail** in your checks.
|
||||
|
||||
Example Probe: 'Check For Load Average'
|
||||
---------------------------------------
|
||||
If you want to send a mail with a message like:
|
||||
|
||||
"At 2016/10/06 11:11:12 server.foo.com has encountered a problem
|
||||
during LOAD-AVERAGE-15 (:LIMIT 10) with a value of 30"
|
||||
|
||||
|
||||
write the following and use **pretty-mail** in your checks:
|
||||
|
||||
(defvar *alerts*
|
||||
(list
|
||||
'(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function
|
||||
params " with a value of " result "' | mail yourmail@foo.bar"))))
|
||||
|
||||
If you don't want anything to be triggered, you can use the following
|
||||
in *alerts*
|
||||
|
||||
'(nothing-to-send nil)
|
||||
|
||||
If you find it easier to read, you can add + in the concatenation,
|
||||
this is simply discarded when the program parse the list.
|
||||
Variant 1
|
||||
~~~~~~~~~
|
||||
If you find it easier to read, you can add + in the concatenation.
|
||||
The + is discarded by reed-alert as soon as it parses the list.
|
||||
|
||||
'(pretty-mail (date + " " + hostname + " has encountered a problem " + function))
|
||||
|
||||
The differents probes
|
||||
=====================
|
||||
Variant 2
|
||||
~~~~~~~~~
|
||||
If you don't want anything to be triggered use the following in *alerts*:
|
||||
|
||||
Probes are written in LISP and sometimes relies on system call, like
|
||||
for ping or the average load of the system. It cares about running on
|
||||
different operating system.
|
||||
'(nothing-to-send nil)
|
||||
|
||||
|
||||
The Probes
|
||||
==========
|
||||
|
||||
Probes are written in Common LISP.
|
||||
|
||||
The :desc Parameter
|
||||
-------------------
|
||||
The :desc parameter allows you to describe specifically what your check
|
||||
does. It can be put in every probe.
|
||||
|
||||
The following parameter is allowed for every probes. It allows you to
|
||||
describe what the check do / concern to put it in the notification if you want
|
||||
:desc "STRING"
|
||||
|
||||
|
||||
Overview
|
||||
--------
|
||||
As of this commit, reed-alert ships with the following probes:
|
||||
|
||||
(1) number-of-processes
|
||||
(2) pid-running
|
||||
(3) disk-usage
|
||||
(4) file-exists
|
||||
(5) file-updated
|
||||
(6) load-average-1
|
||||
(7) load-average-5
|
||||
(8) load-average-15
|
||||
(9) ping
|
||||
(10) command
|
||||
(11) service
|
||||
(12) file-less-than
|
||||
|
||||
|
||||
number-of-processes
|
||||
-------------------
|
||||
Check if the actual number of processes of the system exceed the limit
|
||||
Check if the actual number of processes of the system exceeds a specific limit.
|
||||
|
||||
> Set the limit that will trigger an alert when exceeded
|
||||
> Set the limit that will trigger an alert when exceeded.
|
||||
:limit INTEGER
|
||||
|
||||
Example : `(=> example number-of-processes (:limit 200))`
|
||||
Example : `(=> alert number-of-processes (:limit 200))`
|
||||
|
||||
|
||||
pid-running
|
||||
-----------
|
||||
Check if the PID number found in a .pid file is alive
|
||||
Check if the PID number found in a .pid file is alive.
|
||||
|
||||
> Set the path of the pid file. If user don't have permission to open it, return "file not found"
|
||||
> Set the path of the pid file. If $USER doesn't have permission to open it, return "file not found".
|
||||
:path "STRING"
|
||||
|
||||
Example : `(=> example pid-running (:path "/var/run/nginx.pid"))`
|
||||
Example : `(=> alert pid-running (:path "/var/run/nginx.pid"))`
|
||||
|
||||
|
||||
disk-usage
|
||||
----------
|
||||
Check if the used percent of the choosed partition exceed the limit
|
||||
Check if the disk-usage of a chosen partition does exceed a specific limit.
|
||||
|
||||
> Set the mountpoint to check
|
||||
> Set the mountpoint to check.
|
||||
:path "STRING"
|
||||
|
||||
> Set the limit that will trigger an alert when exceeded
|
||||
> Set the limit that will trigger an alert when exceeded.
|
||||
:limit INTEGER
|
||||
|
||||
Example : `(=> example disk-usage (:path "/tmp" :limit 50))`
|
||||
Example : `(=> alert disk-usage (:path "/tmp" :limit 50))`
|
||||
|
||||
|
||||
file-exists
|
||||
-----------
|
||||
Check if a file exists
|
||||
Check if a file exists.
|
||||
|
||||
> Set the path of the file to check
|
||||
> Set the path of the file to check.
|
||||
:path "STRING"
|
||||
|
||||
Example : `(=> example file-exists (:path "/var/postgresql/standby"))`
|
||||
Example : `(=> alert file-exists (:path "/var/postgresql/standby"))`
|
||||
|
||||
|
||||
file-updated
|
||||
------------
|
||||
Check if a file exists and has been updated since a defined time
|
||||
Check if a file exists and has been updated since a defined time.
|
||||
|
||||
> Set the path of the file to check
|
||||
> Set the path of the file to check.
|
||||
:path "STRING"
|
||||
|
||||
> Set the limit in minutes since the last modification time before triggering an alert
|
||||
> Set the limit in minutes since the last modification time before triggering an alert.
|
||||
:limit INTEGER
|
||||
|
||||
Example : `(=> example file-updated (:path "/var/log/nginx/access.log" :limit 60))`
|
||||
Example : `(=> alert file-updated (:path "/var/log/nginx/access.log" :limit 60))`
|
||||
|
||||
|
||||
load-average-1
|
||||
--------------
|
||||
Check if the load average on the last minute exceed the limit
|
||||
Check if the load average during the last minute exceeds a specific limit.
|
||||
|
||||
> Set the limit not to exceed
|
||||
> Set the limit not to exceed.
|
||||
:limit INTEGER
|
||||
|
||||
Example : `(=> example load-average-1 (:limit 2))`
|
||||
Example : `(=> alert load-average-1 (:limit 2))`
|
||||
|
||||
|
||||
load-average-5
|
||||
--------------
|
||||
Check if the load average on the last fives minutes exceed the limit
|
||||
Check if the load average during the last five minutes exceeds a specific limit.
|
||||
|
||||
> Set the limit not to exceed
|
||||
> Set the limit not to exceed.
|
||||
:limit INTEGER
|
||||
|
||||
Example : `(=> example load-average-5 (:limit 2))`
|
||||
Example : `(=> alert load-average-5 (:limit 2))`
|
||||
|
||||
|
||||
load-average-15
|
||||
---------------
|
||||
Check if the load average on the last fifteen minutes exceed the limit
|
||||
Check if the load average during the last fifteen minutes exceeds a specific limit.
|
||||
|
||||
> Set the limit not to exceed
|
||||
> Set the limit not to exceed.
|
||||
:limit INTEGER
|
||||
|
||||
Example : `(=> example load-average-15 (:limit 2))`
|
||||
Example : `(=> alert load-average-15 (:limit 2))`
|
||||
|
||||
|
||||
ping
|
||||
----
|
||||
Check if a remote host answer the 2 ICMP ping
|
||||
Check if a remote host answers the 2 ICMP ping.
|
||||
|
||||
> Set the host to ping. Return an error if ping command returns non-zero
|
||||
> Set the host to ping. Return an error if ping command returns non-zero.
|
||||
:host "STRING" (can be IP or hostname)
|
||||
|
||||
Example : `(=> example ping (:host "8.8.8.8"))`
|
||||
Example : `(=> alert ping (:host "8.8.8.8"))`
|
||||
|
||||
|
||||
command
|
||||
-------
|
||||
Execute an arbitrary command which trigger an alert if the command return a non-zero value
|
||||
Execute an arbitrary command which triggers an alert if it returns a non-zero value.
|
||||
|
||||
> Command to execute, accept commands with pipes
|
||||
> Command to execute, accept commands with pipes.
|
||||
:command "STRING"
|
||||
|
||||
Example : `(=> example command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))`
|
||||
Example : `(=> alert command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))`
|
||||
|
||||
service
|
||||
-------
|
||||
Check if a service is started on the system.
|
||||
|
||||
> Set the name of the service to test
|
||||
:name STRING
|
||||
|
||||
Example : `(=> alert service (:name "mysql-server"))`
|
||||
|
||||
file-less-than
|
||||
--------------
|
||||
Check if a file has a size less than a specified limit.
|
||||
|
||||
> Set the path of the file to check.
|
||||
:path "STRING"
|
||||
|
||||
> Set the limit in bytes before triggering an alert.
|
||||
:limit INTEGER
|
||||
|
||||
Example : `(=> alert file-less-than (:path "/var/log/nginx/access.log" :limit 60))`
|
||||
|
|
Loading…
Reference in New Issue