Sync with new alert declaration and add explanations with code usage

This commit is contained in:
Solene Rapenne 2018-01-10 20:17:32 +01:00
parent 8e2203d405
commit 439acf53f4
1 changed files with 115 additions and 32 deletions

147
README
View File

@ -20,11 +20,16 @@ tested with both **sbcl** and **ecl** - which should be available for
most distributions.
(On OpenBSD you may prefer to use ecl because sbcl needs 'wxallowed'
where the binary is.)
on the partition where the binary is.)
To make reed-alert's deployment easier I avoid using external
libraries. reed-alert only requires a Common LISP interpreter and a
few files.
its own files.
A development to use quicklisp libraries to write more sophisticated
checks like "does this url contains a pattern ?" had begun and had
been abandoned, it has been decided to write shell command in the
probe **command** if the user need more elaborated checks.
Code-Readability
@ -34,7 +39,7 @@ Although the code is very rough for now, I think it's already fairly
understandable by people who do need this kind of tool.
I will try to improve on the readability of the config file in future
commits.
commits. NOTE : declaration of notifiers is easier now.
Usage
@ -58,52 +63,53 @@ The configuration is explained below.
The Notification System
=======================
+ function : the name of the probe
+ date : the current date with format YYYY/MM/DD hh:mm:ss
+ params : the parameters of the probe
+ hostname : the hostname of the server
+ result : the error returned (the value exceeding the limit, file not found)
+ description : an arbitrary description naming a check
+ level : the type of notification used
+ os : the type of operating system (FreeBSD/Linux/OpenBSD)
+ _ : a space character
+ space : a space character
+ newline : a newline character
When a check return an error, a previously defined notifier will be
called. The notifier is a shell command with a name. The shell command
can contains variables from reed-alert.
+ %function% : the name of the probe
+ %date% : the current date with format YYYY/MM/DD hh:mm:ss
+ %params% : the parameters of the probe
+ %hostname% : the hostname of the server
+ %result% : the error returned (the value exceeding the limit, file not found)
+ %description% : an arbitrary description naming a check
+ %level% : the type of notification used
+ %os% : the type of operating system (FreeBSD/Linux/OpenBSD)
+ %newline% : a newline character
Example Probe: 'Check For Load Average'
Example Probe 1: 'Check For Load Average'
---------------------------------------
If you want to send a mail with a message like:
"At 2016/10/06 11:11:12 server.foo.com has encountered a problem
"On 2016/10/06 11:11:12 server.foo.com has encountered a problem
during LOAD-AVERAGE-15 (:LIMIT 10) with a value of 30"
write the following and use **pretty-mail** in your checks:
write the following at the top of the file and use **pretty-mail** in your checks:
(defvar *alerts*
(list
'(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function
params " with a value of " result "' | mail yourmail@foo.bar"))))
(alert pretty-mail "echo 'On %date% %hostname% has encountered a problem during %function%
%params% with a value of %result%' | mail yourmail@foo.bar")
Variant 1
~~~~~~~~~
If you find it easier to read, you can add + in the concatenation.
The + is discarded by reed-alert as soon as it parses the list.
Example Probe 2: 'Don't do anything'
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you don't want anything to be done when an error occur, use the following :
'(pretty-mail (date + " " + hostname + " has encountered a problem " + function))
(alert nothing-to-send "")
Variant 2
~~~~~~~~~
If you don't want anything to be triggered use the following in *alerts*:
Example Probe 3: 'Send SMS'
~~~~~~~~~~~~~~~~~~~~~~~~~~~
You may want to use an external service to send a SMS, this is totally
possible as we rely on a shell command :
'(nothing-to-send nil)
(alert sms "echo 'error on %hostname : %function% %result%'
| curl -u login:pass http://api.sendsms.com/")
The Probes
==========
Probes are written in Common LISP.
Probes are written in Common LISP. They are predefined checks.
The :desc Parameter
-------------------
@ -230,6 +236,7 @@ Example : `(=> alert ping (:host "8.8.8.8"))`
command
-------
Execute an arbitrary command which triggers an alert if it returns a non-zero value.
This may be the most useful probe because it let the user do any check needed.
> Command to execute, accept commands with pipes.
:command "STRING"
@ -255,4 +262,80 @@ Check if a file has a size less than a specified limit.
> Set the limit in bytes before triggering an alert.
:limit INTEGER
Example : `(=> alert file-less-than (:path "/var/log/nginx/access.log" :limit 60))`
Example : `(=> alert file-less-than (:path "/var/log/nginx.log" :limit 60))`
The configuration file
======================
The configuration file is Common LISP code, so it's evaluated. It's
possible to write some logic within it.
Loops
-----
It's possible to write loops if you don't want to repeat code
(loop for host in '("bitreich.org" "dataswamp.org" "floodgap.com")
do
(=> mail ping (:host host)))
or another example
(loop for service in '("smtpd" "nginx" "mysqld" "postgresql")
do
(=> mail service (:name service)))
and another example using rows from a file to check remote hosts
(with-open-file (stream "hosts.txt")
(loop for line = (read-line stream nil)
while line
do
(=> mail ping (:host line))))
Conditional
-----------
It is also possible to achieve conditionals. There are two very useful
conditionals groups.
Dependency
~~~~~~~~~~
Sometimes it may be a good idea to stop some probes if a probe
fail. In a case where you need to check a path through a network, from
the nearest machine to the remote target. If we can't reach our local
router, probes requiring the router to work will trigger errors so we
should skip them.
(stop-if-error
(=> mail ping (:host "192.168.1.1" :desc "My local router"))
(=> mail ping (:host "89.89.89.89" :desc "My ISP DNS server"))
(=> mail ping (:host "kernel.org" :desc "Remote website")))
Note : stop-if-error is an alias for the **and** function.
Escalation
~~~~~~~~~~
It could be a good idea to use different alerts
depending on how critical a check is, but sometimes, the critical
level may depend of the value of the error and/or the delay between
the detection and fixing it. You could want to receive a mail when
things need to be fixed on spare time, but mail another people if
things aren't fixed after some level.
(escalation
(=> mail-me disk-usage (:path "/" :limit 70))
(=> sms-me disk-usage (:path "/" :limit 90))
(=> buzzer disk-usage (:path "/" :limit 98)))
In this example, we check the disk usage, I will get a mail through
"mail-me" alert if the disk usage go get more than 70%. Once it goes
that far, it will check if the disk usage gets more than 90%, if so,
I'll receive a sms through "sms-me" alert. And then, if it goes more
than 98%, the "buzzer" alert will make some bad noises in the room to
warn me about this.
Note : escalation is an alias for the **or** function.