New blogpost \o/ on CD with GitHub Webhooks

This commit is contained in:
Tito Sacchi 2022-06-20 16:24:56 +02:00
parent 298989eab0
commit 26fe13d440
3 changed files with 599 additions and 0 deletions

View File

@ -0,0 +1,441 @@
---
title: dumb-cd-webhooks
displaytitle: A dumb CD solution with GitHub Webhooks and a shell script
summary: 'Or: how to write a webserver in Bash'
tags: devops, ci, cd, git
---
I've been working on the backend of a school project (it will become public
soon) for the last few weeks. It's a Python 3.10 +
[FastAPI](https://fastapi.tiangolo.com) +
[psycopg3](https://www.psycopg.org/psycopg3/) API backed by a PostgreSQL DB. Our
server is a plain Ubuntu 20.04 box on Hetzner and our deployment is as simple as
a Python venv and a systemd service (with socket activation!). No Docker, no
Kubernetes, no supervisord. We follow the [KISS
philosophy](https://wiki.archlinux.org/title/Arch_terminology#KISS). To be
perfectly honest, I wanted to install Arch on our server, but we agreed that
Ubuntu is a bit more reliable.
We're using GitHub Actions as a CI solution and it works well; it checks our
code, builds a wheel and stores it as a build artifact. And something that I
find *really* boring and time-consuming is manually downloading the wheel on
our server and updating the Python venv. Wait, isn't this problem commonly
solved with ✨ *CD* ✨?
Ok, there are tons of complete CD solutions for containerized and advanced
workloads, using Docker, Kubernetes, AWS... But I'm a dumb idiot with some
decent scripting knowledge and no will to learn any of these technologies. How
can I start a script whenever CI builds a new wheel on master? Enter [*GitHub
Webhooks*][gh-webhooks]!
Basically GH will send a HTTP POST request to a chosen endpoint whenever certain
events happen. In particular, we are interested in the `workflow_run` event.
If you create a webhook on GitHub it will ask you for a secret to be used to
secure requests to your endpoint. Just choose a random string (`openssl rand
-hex 8`) and write it down -- it will be used in our script to check that
requests are actually signed by GitHub.
## Who needs an HTTP library to process requests?
I decided to take a very simple approach: 1 successful CI build = 1 script run.
This means 1 webhook request = 1 script run. The simplest way that I came up
with to do this is with a *systemd socket-activated Bash script*. Everytime
systemd will receive a connection on a socket it will start a custom process
that will handle the connection: this is the way most
[`inetd`](https://en.wikipedia.org/wiki/Inetd)-style daemons work.
**UNIX history recap:** (feel free to skip!) Traditional UNIX network daemons
(i.e., network services) would `accept()` connections on managed sockets and
then `fork()` a different process to handle each of these. With socket
activation (as implemented by inetd, xinetd or systemd) a single daemon (or
directly the init system!) listens on the appropriate ports for *all services*
on the machine and does the job of `accept`ing connections. Each connection will
be handled by a different process, launched as a child of the socket manager.
This minimizes load if the service is not always busy, because there won't be
any processes stuck waiting on sockets. Everytime a connection is closed the
corresponding process exits and the system remains in a clean state.
The socket manager is completely protocol-agnostic: it is up to the
service-specific processes to implement the appropriate protocols. In our case,
systemd will start our Bash script and pass the socket as file descriptors. This
means that our Bash script will have to *talk HTTP*![^overengineering]
## How do we do this?
### systemd and socket activation
Let's start by configuring systemd. The [systemd.socket man
page](https://man7.org/linux/man-pages/man5/systemd.socket.5.html). We have to
create a socket unit and a corresponding service unit template. I'll use
`cd-handler` as the unit name. I will setup systemd to listen on the UNIX domain
socket `/run/myapp-cd.sock` that you can point your reverse proxy (e.g. NGINX)
to. TCP port 8027 is mostly for debugging purposes - but if you don't need HTTPS
you can use systemd's socket directly as the webhook endpoint.[^socat]
**Socket unit:**
```
# /etc/systemd/system/cd-handler.socket
[Unit]
Description=Socket for CD webhooks
[Socket]
ListenStream=/run/myapp-cd.sock
ListenStream=0.0.0.0:8027
Accept=yes
[Install]
WantedBy=sockets.target
```
**Service template unit:**
```
# /etc/systemd/system/cd-handler@.socket
[Unit]
Description=GitHub webhook handler for CD
Requires=cd-handler.socket
[Service]
Type=simple
ExecStart=/var/lib/cd/handle-webhook.sh # path to our Bash webhook handler
StandardInput=socket
StandardOutput=socket
StandardError=journal
TimeoutStopSec=5
[Install]
WantedBy=multi-user.target
```
`Accept=yes` will make systemd start a process for each connection. Create a
Bash script in `/var/lib/cd/handle-webhook.sh`; for now we will only answer
`204 No Content` to every possible request. Remember to make the script
executable (`chmod +x`). We will communicate with the socket using standard
streams (stdin/stdout; stderr will be sent to the system logs).
```bash
#!/bin/bash
# /var/lib/cd/handle-webhook.sh
printf "HTTP/1.1 204 No Content\r\n"
printf "Server: TauroServer CI/CD\r\n"
printf "Connection: close\r\n"
printf "\r\n"
```
`systemctl daemon-reload && systemctl enable --now cd-handler.socket` and you
are ready to go. Test our dumb HTTP server with `curl -vv http://127.0.0.1:8027`
or if you're using the awesome
[HTTPie](https://httpie.io/docs/cli/main-features) `http -v :8027`. If you're
successfully receiving a 204, we have just ~~ignored~~ processed a HTTP request
with Bash ^^
### Parsing the HTTP request
The anatomy of HTTP requests and responses is standardised in [RFC
2616][rfc2616], in sections 5 and 6 respectively.
UNIX systems come with powerful text-processing utilities. Bash itself has
[parameter expansion][bash-expansion] features that will be useful in processing
the HTTP request and we will use [jq][jq] to extract
the fields we're interested in from the JSON payload.
We will build our script step by step. I'll define a function to log data (in my
case, it will use `systemd-cat` to write directly to the journal; you can
substitute its body to adapt it to your needs) and another function to send a
response. `send_response` takes two parameters: the first one is the status code
followed by its description (e.g. `204 No Content`) and the second one is the
response body (optionally empty). We're using `wc` to count characters in the
body (we're subtracting 1 for the extra `\n` that Bash sends to `wc`).
```bash
#!/bin/bash
set -euo pipefail # Good practice!
function log {
systemd-cat -t 'GitHub webhook handler' $@
}
function send_response {
printf "Sending response: %s\n" "$1" | log -p info
printf "HTTP/1.1 %s\r\n" "$1"
printf "Server: TauroServer CI/CD\r\n"
printf "Connection: close\r\n"
printf "Content-Type: application/octet-stream\r\n"
printf "Content-Length: $(($(wc -c <<<"$2") - 1))\r\n"
printf "\r\n"
printf '%s' "$2"
}
```
If you add `send_response "200 OK" "Hello World!"` as a last line, you should be
able to get a 200 response with cURL or HTTPie! You can also test webhooks from
the GitHub web UI and see if an OK response is received as expected. We are not
sending the `Date` response headers that *should* be set according to RFC 2616.
![socat and HTTPie talking nicely to each
other.](/resources/dumb-cd-webhooks/socat-httpie-success.jpg)
**Parsing the request line.** As easy as `read method path version`. There will
probably be a pending `\r` on `version` but we don't care this much (we will
assume HTTP/1.1 everywhere ^^).
**Parsing headers.** Headers follow immediately the request line. Expert bash
users would probably use an associative array to store request headers; we will
just use a `case` statement to extract headers we're interested in. We split
each line on `:` by setting the special variable `IFS`, input field separator,
that defaults to a space. `tr` removes pending `\r\n` and brace substitution is
used to remove the space that follows `:` in each header line.
```bash
content_length=0
delivery_id=none
ghsig=none
event_type=none
while IFS=: read -r key val; do
[[ "$key" == "$(printf '\r\n')" ]] && break;
val=$(tr -d '\r\n' <<<"$val")
case "$key" in
"Content-Length")
content_length="${val# }"
;;
"X-GitHub-Delivery")
delivery_id="${val# }"
;;
"X-GitHub-Event")
event_type="${val# }"
;;
"X-Hub-Signature-256")
# This will be trimmed later when comparing to OpenSSL's HMAC
ghsig=$val
;;
*)
;;
esac
printf 'Header: %s: %s\n' "$key" "$val" | log -p debug
done
```
**Reading body and checking HMAC signature.** GitHub sends a hex-encoded
HMAC-SHA-256 signature of the JSON body as the [`X-Hub-Signature-256`
header][gh-securing-webhooks], signed with the secret chosen while creating the
webhook. Without this layer of security, anyone could send a POST and trigger CD
scripts, maybe making us download malicious builds. In a shell script the
easiest way to calculate an HMAC is with the `openssl` command-line tool. We are
using `dd` to read an exact amount of bytes from stdin and the body is passed to
`openssl` with a pipe to avoid sending trailing newlines (using direct
redirection, i.e., `<<<` did not work for me). Brace expansion is used to split
strings. Place your WebHook secret in the `GITHUB_WEBHOOK_SECRET` variable and
set `ENFORCE_HMAC` to something different from 0 (I thought disabling signature
checking could be useful for debugging purposes). You can now play with
cURL/HTTPie/*[insert your favourite HTTP client here]* to see if you receive 401
Unauthorized responses as expected.
```bash
printf "Trying to read request content... %s bytes\n" "$content_length" | log -p info
content=$(dd bs=1 count="$content_length" 2> >(log -p debug) )
mysig=$(printf '%s' "$content" | openssl dgst -sha256 -hmac $GITHUB_WEBHOOK_SECRET)
if [[ "${mysig#* }" == "${ghsig#*=}" ]]; then
log -p notice <<<"HMAC signatures match, proceeding further."
else
log -p warning <<<"HMAC signatures do not match! Request is not authenticated!"
if [[ $ENFORCE_HMAC != 0 ]]; then
send_response "401 Unauthorized" "Provide signature as HTTP header."
log -p err <<<"Exiting now because HMAC signature enforcing is required."
exit 1
fi
fi
```
**Sending the HTTP response.** We will send an appropriate response to response
to GitHub with the function defined earlier.
```bash
if [[ "$event_type" == "none" ]]; then
send_response "400 Bad Request" "Please provide event type as HTTP header."
log -p err <<<"X-GitHub-Event header was not provided."
exit 1
fi
if [[ "$delivery_id" == "none" ]]; then
send_response "400 Bad Request" "Please provide delivery ID as HTTP header."
log -p err <<<"X-GitHub-Delivery header was not provided."
exit 1
fi
printf "GitHub Delivery ID: %s\n" "$delivery_id" | log -p info
printf "GitHub Event type: %s\n" "$event_type" | log -p info
case "$event_type" in
"workflow_run")
send_response "200 OK" "Acknowledged workflow run!"
;;
*)
send_response "204 No Content" ""
exit 0
;;
esac
```
The "HTTP server" part of the script is complete! You can also test this by
asking GitHub to resend past webhooks from the web UI.
### Parsing the JSON body and downloading artifacts
**JSON body.** The JSON schema of GitHub webhooks payloads can be found on the
[official GH docs][gh-webhook-payloads]. We will use [jq][jq] to parse JSON and
with its `-r` flag we will print the fields we're interested on standard output,
each on a separate line. Its stream can be passed to the `read` builtin with
`IFS` set to `\n`. The `|| true` disjunction at the end of the command makes the
script continue with the execution even if jq doesn't find some of the fields we
asked it to extract (e.g., in event that signal the start of a workflow,
`artifacts_url` is not present).
I want to run CD workflows only on the main branch (`main`, `master`, ...), so I
added a check against the variable `MAIN_BRANCH` that you can configure at the
top of the script. GitHub sends `workflow_run` events even when CI workflows
start, but we're only interested in running a custom action when a workflow
succeeds.
```bash
IFS=$'\n' read -r -d '' action branch workflow_status \
name conclusion url artifacts \
commit message author < <(
jq -r '.action,
.workflow_run.head_branch,
.workflow_run.status,
.workflow_run.name,
.workflow_run.conclusion,
.workflow_run.html_url,
.workflow_run.artifacts_url,
.workflow_run.head_commit.id,
.workflow_run.head_commit.message,
.workflow_run.head_commit.author.name' <<<"$content") || true
printf 'Workflow run "%s" %s! See %s\n' "$name" "$workflow_status" "$url" | log -p notice
printf 'Head commit SHA: %s\n' "$commit" | log -p info
printf 'Head commit message: %s\n' "$message" | log -p info
printf 'Commit author: %s\n' "$author" | log -p info
if [[ "$action" != "completed" ]] \
|| [[ "$conclusion" != "success" ]] \
|| [[ "$branch" != "$MAIN_BRANCH" ]];
then exit 0
fi
log -p notice <<<"Proceeding with continuous delivery!"
```
**Build artifacts.** Before running the custom CD script that depends from your
specific deployments scenario, we will download all artifacts build on GitHub
Actions during CI. For example, in our Dart/Flutter webapp this could include
the built website, with Dart already compiled to JavaScript. In the case of our
Python backend the artifact is a Python wheel. This webhook script handler is
completely language-agnostic though, meaning that you can use it with whatever
language or build system you want.
We will download and extract all artifacts in a temporary directory and then
pass its path as an argument to the CD script, with a bunch of other useful
information such as the branch name and the commit SHA. The function
`download_artifacts` downloads and extracts the ZIP files stored on GitHub using
the [Artifacts API][gh-artifacts-api]. It iterates on the JSON array and
extracts appropriate fields using jq's array access syntax. GitHub returns a 302
temporary redirect when it receives a GET on the `archive_download_url` advised
in the artifact body, so we use cURL's `-L` to make it follow redirects.
```bash
function download_artifacts {
# $1: URL
# $2: directory to download artifacts to
pushd "$2" &>/dev/null
artifacts_payload=$(curl --no-progress-meter -u "$GITHUB_API_USER:$GITHUB_API_TOKEN" "$1" 2> >(log -p debug))
artifacts_amount=$(jq -r '.total_count' <<<"$artifacts_payload")
for i in $(seq 1 "$artifacts_amount"); do
printf 'Downloading artifact %s/%s...\n' "$i" "$artifacts_amount" | log -p info
name=$(jq -r ".artifacts[$((i - 1))].name" <<<"$artifacts_payload")
url=$(jq -r ".artifacts[$((i - 1))].archive_download_url" <<<"$artifacts_payload")
printf 'Artifact name: "%s" (downloading from %s)\n' "$name" "$url" | log -p info
tmpfile=$(mktemp)
printf 'Downloading ZIP to %s\n' "$tmpfile" | log -p debug
curl --no-progress-meter -L -u "$GITHUB_API_USER:$GITHUB_API_TOKEN" --output "$tmpfile" "$url" 2> >(log -p debug)
mkdir "$name"
printf 'Unzipping into %s...\n' "$2/$name" | log -p debug
unzip "$tmpfile" -d "$2/$name" | log -p debug
rm "$tmpfile"
done
popd &>/dev/null
}
artifacts_dir=$(mktemp -d)
printf 'Downloading artifacts to %s...\n' "$artifacts_dir" | log -p info
download_artifacts "$artifacts" "$artifacts_dir"
```
**Final step: running your CD script!** If you aliased `log` to `systemd-cat` as I
did above, the `-t` flag will select a different identifier, to make the output
of the custom script stand out from the garbage of our webhook handler in the
system journal. Again, configure `$CD_SCRIPT` appropriately; it will be run from
the directory specified above in the systemd unit file and it will receive the
path to the directory containing the downloaded artifacts as an argument.
**Note**: it will run with root privileges unless specified otherwise in the
service unit file!
```bash
printf 'Running CD script!\n' | log -p notice
$CD_SCRIPT "$artifacts_dir" "$branch" "$commit" 2>&1 | log -t "CD script" -p info
```
For example, our Python backend CD script looks something like this:
```bash
cd "$1"
systemctl stop my-awesome-backend.service
source /var/lib/my-backend/virtualenv/bin/activate
pip3 install --no-deps --force **/*.whl
systemctl start my-awesome-backend.service
```
**Bonus points for cleanup** :) Remove the tmpdir created earlier to store
artifacts:
```bash
printf 'Deleting artifacts directory...\n' | log -p info
rm -r "$artifacts_dir"
```
## Conclusion
You can find the complete script
[here](/resources/dumb-cd-webhooks/handle-webhook.sh). It's 158LoC, not that
much, and it's very flexible. There's room for improvement; e.g., selecting
different scripts on different branches. Let me know if you extend this script
or use a similar approach!
[gh-webhooks]: https://docs.github.com/en/developers/webhooks-and-events/webhooks/about-webhooks
[rfc2616]: https://datatracker.ietf.org/doc/html/rfc2616
[bash-expansion]: https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html
[jq]: https://stedolan.github.io/jq/
[gh-securing-webhooks]: https://docs.github.com/en/developers/webhooks-and-events/webhooks/securing-your-webhooks#validating-payloads-from-github
[gh-webhook-payloads]: https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads
[gh-artifacts-api]: https://docs.github.com/en/rest/actions/artifacts
[^overengineering]: At this point, you might be wondering whether it is worth
it. It probably isn't, and the simplest solution could be a 100LoC Python +
FastAPI script to handle webhooks. But I *really* wanted to do this with basic
UNIX tools.
[^socat]: You can also use `socat` to do what systemd is doing for us if you
want to test your script on your local machine: `socat
TCP-LISTEN:8027,reuseaddr,fork SYSTEM:/path/to/handle-webhook.sh`.
[comment]: # vim: ts=2:sts=2:sw=2:et:nojoinspaces:tw=80

View File

@ -0,0 +1,158 @@
#!/bin/bash
set -euo pipefail
GITHUB_WEBHOOK_SECRET=REDACTED
GITHUB_API_TOKEN=REDACTED
GITHUB_API_USER=REDACTED
MAIN_BRANCH=master
CD_SCRIPT="/var/lib/path-to-your-backend/continuous-delivery.sh"
ENFORCE_HMAC=1
function log {
systemd-cat -t 'GitHub webhook handler' $@
}
function send_response {
printf "Sending response: %s\n" "$1" | log -p info
printf "HTTP/1.1 %s\r\n" "$1"
printf "Server: TauroServer CI/CD\r\n"
printf "Connection: close\r\n"
printf "Content-Type: application/octet-stream\r\n"
printf "Content-Length: $(($(wc -c <<<"$2") - 1))\r\n"
printf "\r\n"
printf '%s' "$2"
}
content_length=0
delivery_id=none
ghsig=none
event_type=none
read method path version
printf "Request method: $method\n" | log -p debug
printf "Request path: $path\n" | log -p debug
printf "HTTP version: $version\n" | log -p debug
while IFS=: read -r key val; do
[[ "$key" == "$(printf '\r\n')" ]] && break;
val=$(tr -d '\r\n' <<<"$val")
case "$key" in
"Content-Length")
content_length="${val# }"
;;
"X-GitHub-Delivery")
delivery_id="${val# }"
;;
"X-GitHub-Event")
event_type="${val# }"
;;
"X-Hub-Signature-256")
ghsig=$val
;;
*)
;;
esac
printf 'Header: %s: %s\n' "$key" "$val" | log -p debug
done
printf "Trying to read request content... %s bytes\n" "$content_length" | log -p info
content=$(dd bs=1 count="$content_length" 2> >(log -p debug) )
# xxd <<<"$content" >&2
mysig=$(printf '%s' "$content" | openssl dgst -sha256 -hmac $GITHUB_WEBHOOK_SECRET)
if [[ "${mysig#* }" == "${ghsig#*=}" ]]; then
log -p notice <<<"HMAC signatures match, proceeding further."
else
log -p warning <<<"HMAC signatures do not match! Request is not authenticated!"
if [[ $ENFORCE_HMAC != 0 ]]; then
send_response "401 Unauthorized" "Provide signature as HTTP header."
log -p err <<<"Exiting now because HMAC signature enforcing is required."
exit 1
fi
fi
if [[ "$event_type" == "none" ]]; then
send_response "400 Bad Request" "Please provide event type as HTTP header."
log -p err <<<"X-GitHub-Event header was not provided."
exit 1
fi
if [[ "$delivery_id" == "none" ]]; then
send_response "400 Bad Request" "Please provide delivery ID as HTTP header."
log -p err <<<"X-GitHub-Delivery header was not provided."
exit 1
fi
printf "GitHub Delivery ID: %s\n" "$delivery_id" | log -p info
printf "GitHub Event type: %s\n" "$event_type" | log -p info
case "$event_type" in
"ping")
send_response "204 No Content" ""
exit 0
;;
"workflow_run")
send_response "200 OK" "Acknowledged workflow run!"
;;
*)
send_response "204 No Content" ""
exit 0
;;
esac
IFS=$'\n' read -r -d '' action branch workflow_status name conclusion url artifacts \
commit message author < <(
jq -r '.action,
.workflow_run.head_branch,
.workflow_run.status,
.workflow_run.name,
.workflow_run.conclusion,
.workflow_run.html_url,
.workflow_run.artifacts_url,
.workflow_run.head_commit.id,
.workflow_run.head_commit.message,
.workflow_run.head_commit.author.name' <<<"$content") || true
printf 'Workflow run "%s" %s! See %s\n' "$name" "$workflow_status" "$url" | log -p notice
printf 'Head commit SHA: %s\n' "$commit" | log -p info
printf 'Head commit message: %s\n' "$message" | log -p info
printf 'Commit author: %s\n' "$author" | log -p info
if [[ "$action" != "completed" ]] || \
[[ "$conclusion" != "success" ]] || \
[[ "$branch" != "$MAIN_BRANCH" ]]; then exit 0; fi
log -p notice <<<"Proceeding with continuous delivery!"
function download_artifacts {
# $1: URL
# $2: directory to download artifacts to
pushd "$2" &>/dev/null
artifacts_payload=$(curl --no-progress-meter -u "$GITHUB_API_USER:$GITHUB_API_TOKEN" "$1" 2> >(log -p debug))
artifacts_amount=$(jq -r '.total_count' <<<"$artifacts_payload")
for i in $(seq 1 "$artifacts_amount"); do
printf 'Downloading artifact %s/%s...\n' "$i" "$artifacts_amount" | log -p info
name=$(jq -r ".artifacts[$((i - 1))].name" <<<"$artifacts_payload")
url=$(jq -r ".artifacts[$((i - 1))].archive_download_url" <<<"$artifacts_payload")
printf 'Artifact name: "%s" (downloading from %s)\n' "$name" "$url" | log -p info
tmpfile=$(mktemp)
printf 'Downloading ZIP to %s\n' "$tmpfile" | log -p debug
curl --no-progress-meter -L -u "$GITHUB_API_USER:$GITHUB_API_TOKEN" --output "$tmpfile" "$url" 2> >(log -p debug)
mkdir "$name"
printf 'Unzipping into %s...\n' "$2/$name" | log -p debug
unzip "$tmpfile" -d "$2/$name" | log -p debug
rm "$tmpfile"
done
popd &>/dev/null
}
artifacts_dir=$(mktemp -d)
printf 'Downloading artifacts to %s...\n' "$artifacts_dir" | log -p info
download_artifacts "$artifacts" "$artifacts_dir"
printf 'Running CD script!\n' | log -p notice
$CD_SCRIPT "$artifacts_dir" "$branch" "$commit" 2>&1 | log -t "CD script" -p info
printf 'Deleting artifacts directory...\n' | log -p info
rm -r "$artifacts_dir"

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB