mirror of git://bitreich.org/reed-alert
Sync with new alert declaration and add explanations with code usage
This commit is contained in:
parent
8e2203d405
commit
439acf53f4
147
README
147
README
|
@ -20,11 +20,16 @@ tested with both **sbcl** and **ecl** - which should be available for
|
|||
most distributions.
|
||||
|
||||
(On OpenBSD you may prefer to use ecl because sbcl needs 'wxallowed'
|
||||
where the binary is.)
|
||||
on the partition where the binary is.)
|
||||
|
||||
To make reed-alert's deployment easier I avoid using external
|
||||
libraries. reed-alert only requires a Common LISP interpreter and a
|
||||
few files.
|
||||
its own files.
|
||||
|
||||
A development to use quicklisp libraries to write more sophisticated
|
||||
checks like "does this url contains a pattern ?" had begun and had
|
||||
been abandoned, it has been decided to write shell command in the
|
||||
probe **command** if the user need more elaborated checks.
|
||||
|
||||
|
||||
Code-Readability
|
||||
|
@ -34,7 +39,7 @@ Although the code is very rough for now, I think it's already fairly
|
|||
understandable by people who do need this kind of tool.
|
||||
|
||||
I will try to improve on the readability of the config file in future
|
||||
commits.
|
||||
commits. NOTE : declaration of notifiers is easier now.
|
||||
|
||||
|
||||
Usage
|
||||
|
@ -58,52 +63,53 @@ The configuration is explained below.
|
|||
The Notification System
|
||||
=======================
|
||||
|
||||
+ function : the name of the probe
|
||||
+ date : the current date with format YYYY/MM/DD hh:mm:ss
|
||||
+ params : the parameters of the probe
|
||||
+ hostname : the hostname of the server
|
||||
+ result : the error returned (the value exceeding the limit, file not found)
|
||||
+ description : an arbitrary description naming a check
|
||||
+ level : the type of notification used
|
||||
+ os : the type of operating system (FreeBSD/Linux/OpenBSD)
|
||||
+ _ : a space character
|
||||
+ space : a space character
|
||||
+ newline : a newline character
|
||||
When a check return an error, a previously defined notifier will be
|
||||
called. The notifier is a shell command with a name. The shell command
|
||||
can contains variables from reed-alert.
|
||||
|
||||
+ %function% : the name of the probe
|
||||
+ %date% : the current date with format YYYY/MM/DD hh:mm:ss
|
||||
+ %params% : the parameters of the probe
|
||||
+ %hostname% : the hostname of the server
|
||||
+ %result% : the error returned (the value exceeding the limit, file not found)
|
||||
+ %description% : an arbitrary description naming a check
|
||||
+ %level% : the type of notification used
|
||||
+ %os% : the type of operating system (FreeBSD/Linux/OpenBSD)
|
||||
+ %newline% : a newline character
|
||||
|
||||
|
||||
Example Probe: 'Check For Load Average'
|
||||
Example Probe 1: 'Check For Load Average'
|
||||
---------------------------------------
|
||||
If you want to send a mail with a message like:
|
||||
|
||||
"At 2016/10/06 11:11:12 server.foo.com has encountered a problem
|
||||
"On 2016/10/06 11:11:12 server.foo.com has encountered a problem
|
||||
during LOAD-AVERAGE-15 (:LIMIT 10) with a value of 30"
|
||||
|
||||
|
||||
write the following and use **pretty-mail** in your checks:
|
||||
write the following at the top of the file and use **pretty-mail** in your checks:
|
||||
|
||||
(defvar *alerts*
|
||||
(list
|
||||
'(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function
|
||||
params " with a value of " result "' | mail yourmail@foo.bar"))))
|
||||
(alert pretty-mail "echo 'On %date% %hostname% has encountered a problem during %function%
|
||||
%params% with a value of %result%' | mail yourmail@foo.bar")
|
||||
|
||||
Variant 1
|
||||
~~~~~~~~~
|
||||
If you find it easier to read, you can add + in the concatenation.
|
||||
The + is discarded by reed-alert as soon as it parses the list.
|
||||
Example Probe 2: 'Don't do anything'
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
If you don't want anything to be done when an error occur, use the following :
|
||||
|
||||
'(pretty-mail (date + " " + hostname + " has encountered a problem " + function))
|
||||
(alert nothing-to-send "")
|
||||
|
||||
Variant 2
|
||||
~~~~~~~~~
|
||||
If you don't want anything to be triggered use the following in *alerts*:
|
||||
Example Probe 3: 'Send SMS'
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
You may want to use an external service to send a SMS, this is totally
|
||||
possible as we rely on a shell command :
|
||||
|
||||
'(nothing-to-send nil)
|
||||
(alert sms "echo 'error on %hostname : %function% %result%'
|
||||
| curl -u login:pass http://api.sendsms.com/")
|
||||
|
||||
|
||||
The Probes
|
||||
==========
|
||||
|
||||
Probes are written in Common LISP.
|
||||
Probes are written in Common LISP. They are predefined checks.
|
||||
|
||||
The :desc Parameter
|
||||
-------------------
|
||||
|
@ -230,6 +236,7 @@ Example : `(=> alert ping (:host "8.8.8.8"))`
|
|||
command
|
||||
-------
|
||||
Execute an arbitrary command which triggers an alert if it returns a non-zero value.
|
||||
This may be the most useful probe because it let the user do any check needed.
|
||||
|
||||
> Command to execute, accept commands with pipes.
|
||||
:command "STRING"
|
||||
|
@ -255,4 +262,80 @@ Check if a file has a size less than a specified limit.
|
|||
> Set the limit in bytes before triggering an alert.
|
||||
:limit INTEGER
|
||||
|
||||
Example : `(=> alert file-less-than (:path "/var/log/nginx/access.log" :limit 60))`
|
||||
Example : `(=> alert file-less-than (:path "/var/log/nginx.log" :limit 60))`
|
||||
|
||||
|
||||
The configuration file
|
||||
======================
|
||||
|
||||
The configuration file is Common LISP code, so it's evaluated. It's
|
||||
possible to write some logic within it.
|
||||
|
||||
|
||||
Loops
|
||||
-----
|
||||
It's possible to write loops if you don't want to repeat code
|
||||
|
||||
(loop for host in '("bitreich.org" "dataswamp.org" "floodgap.com")
|
||||
do
|
||||
(=> mail ping (:host host)))
|
||||
|
||||
or another example
|
||||
|
||||
(loop for service in '("smtpd" "nginx" "mysqld" "postgresql")
|
||||
do
|
||||
(=> mail service (:name service)))
|
||||
|
||||
and another example using rows from a file to check remote hosts
|
||||
|
||||
(with-open-file (stream "hosts.txt")
|
||||
(loop for line = (read-line stream nil)
|
||||
while line
|
||||
do
|
||||
(=> mail ping (:host line))))
|
||||
|
||||
|
||||
Conditional
|
||||
-----------
|
||||
It is also possible to achieve conditionals. There are two very useful
|
||||
conditionals groups.
|
||||
|
||||
|
||||
Dependency
|
||||
~~~~~~~~~~
|
||||
Sometimes it may be a good idea to stop some probes if a probe
|
||||
fail. In a case where you need to check a path through a network, from
|
||||
the nearest machine to the remote target. If we can't reach our local
|
||||
router, probes requiring the router to work will trigger errors so we
|
||||
should skip them.
|
||||
|
||||
(stop-if-error
|
||||
(=> mail ping (:host "192.168.1.1" :desc "My local router"))
|
||||
(=> mail ping (:host "89.89.89.89" :desc "My ISP DNS server"))
|
||||
(=> mail ping (:host "kernel.org" :desc "Remote website")))
|
||||
|
||||
Note : stop-if-error is an alias for the **and** function.
|
||||
|
||||
|
||||
Escalation
|
||||
~~~~~~~~~~
|
||||
It could be a good idea to use different alerts
|
||||
depending on how critical a check is, but sometimes, the critical
|
||||
level may depend of the value of the error and/or the delay between
|
||||
the detection and fixing it. You could want to receive a mail when
|
||||
things need to be fixed on spare time, but mail another people if
|
||||
things aren't fixed after some level.
|
||||
|
||||
(escalation
|
||||
(=> mail-me disk-usage (:path "/" :limit 70))
|
||||
(=> sms-me disk-usage (:path "/" :limit 90))
|
||||
(=> buzzer disk-usage (:path "/" :limit 98)))
|
||||
|
||||
In this example, we check the disk usage, I will get a mail through
|
||||
"mail-me" alert if the disk usage go get more than 70%. Once it goes
|
||||
that far, it will check if the disk usage gets more than 90%, if so,
|
||||
I'll receive a sms through "sms-me" alert. And then, if it goes more
|
||||
than 98%, the "buzzer" alert will make some bad noises in the room to
|
||||
warn me about this.
|
||||
|
||||
Note : escalation is an alias for the **or** function.
|
||||
|
|
Loading…
Reference in New Issue