--- title: dumb-cd-webhooks displaytitle: A dumb CD solution with GitHub Webhooks and a shell script summary: 'Or: how to write a webserver in Bash' tags: devops, ci, cd, git --- I've been working on the backend of a school project (it will become public soon) for the last few weeks. It's a Python 3.10 + [FastAPI](https://fastapi.tiangolo.com) + [psycopg3](https://www.psycopg.org/psycopg3/) API backed by a PostgreSQL DB. Our server is a plain Ubuntu 20.04 box on Hetzner and our deployment is as simple as a Python venv and a systemd service (with socket activation!). No Docker, no Kubernetes, no supervisord. We follow the [KISS philosophy](https://wiki.archlinux.org/title/Arch_terminology#KISS). To be perfectly honest, I wanted to install Arch on our server, but we agreed that Ubuntu is a bit more reliable. We're using GitHub Actions as a CI solution and it works well; it checks our code, builds a wheel and stores it as a build artifact. And something that I find *really* boring and time-consuming is manually downloading the wheel on our server and updating the Python venv. Wait, isn't this problem commonly solved with ✨ *CD* ✨? Ok, there are tons of complete CD solutions for containerized and advanced workloads, using Docker, Kubernetes, AWS... But I'm a dumb idiot with some decent scripting knowledge and no will to learn any of these technologies. How can I start a script whenever CI builds a new wheel on master? Enter [*GitHub Webhooks*][gh-webhooks]! Basically GH will send a HTTP POST request to a chosen endpoint whenever certain events happen. In particular, we are interested in the `workflow_run` event. If you create a webhook on GitHub it will ask you for a secret to be used to secure requests to your endpoint. Just choose a random string (`openssl rand -hex 8`) and write it down -- it will be used in our script to check that requests are actually signed by GitHub. ## Who needs an HTTP library to process requests? I decided to take a very simple approach: 1 successful CI build = 1 script run. This means 1 webhook request = 1 script run. The simplest way that I came up with to do this is with a *systemd socket-activated Bash script*. Everytime systemd will receive a connection on a socket it will start a custom process that will handle the connection: this is the way most [`inetd`](https://en.wikipedia.org/wiki/Inetd)-style daemons work. **UNIX history recap:** (feel free to skip!) Traditional UNIX network daemons (i.e., network services) would `accept()` connections on managed sockets and then `fork()` a different process to handle each of these. With socket activation (as implemented by inetd, xinetd or systemd) a single daemon (or directly the init system!) listens on the appropriate ports for *all services* on the machine and does the job of `accept`ing connections. Each connection will be handled by a different process, launched as a child of the socket manager. This minimizes load if the service is not always busy, because there won't be any processes stuck waiting on sockets. Everytime a connection is closed the corresponding process exits and the system remains in a clean state. The socket manager is completely protocol-agnostic: it is up to the service-specific processes to implement the appropriate protocols. In our case, systemd will start our Bash script and pass the socket as file descriptors. This means that our Bash script will have to *talk HTTP*![^overengineering] ## How do we do this? ### systemd and socket activation Let's start by configuring systemd. The [systemd.socket man page](https://man7.org/linux/man-pages/man5/systemd.socket.5.html). We have to create a socket unit and a corresponding service unit template. I'll use `cd-handler` as the unit name. I will setup systemd to listen on the UNIX domain socket `/run/myapp-cd.sock` that you can point your reverse proxy (e.g. NGINX) to. TCP port 8027 is mostly for debugging purposes - but if you don't need HTTPS you can use systemd's socket directly as the webhook endpoint.[^socat] **Socket unit:** ``` # /etc/systemd/system/cd-handler.socket [Unit] Description=Socket for CD webhooks [Socket] ListenStream=/run/myapp-cd.sock ListenStream=0.0.0.0:8027 Accept=yes [Install] WantedBy=sockets.target ``` **Service template unit:** ``` # /etc/systemd/system/cd-handler@.socket [Unit] Description=GitHub webhook handler for CD Requires=cd-handler.socket [Service] Type=simple ExecStart=/var/lib/cd/handle-webhook.sh # path to our Bash webhook handler StandardInput=socket StandardOutput=socket StandardError=journal TimeoutStopSec=5 [Install] WantedBy=multi-user.target ``` `Accept=yes` will make systemd start a process for each connection. Create a Bash script in `/var/lib/cd/handle-webhook.sh`; for now we will only answer `204 No Content` to every possible request. Remember to make the script executable (`chmod +x`). We will communicate with the socket using standard streams (stdin/stdout; stderr will be sent to the system logs). ```bash #!/bin/bash # /var/lib/cd/handle-webhook.sh printf "HTTP/1.1 204 No Content\r\n" printf "Server: TauroServer CI/CD\r\n" printf "Connection: close\r\n" printf "\r\n" ``` `systemctl daemon-reload && systemctl enable --now cd-handler.socket` and you are ready to go. Test our dumb HTTP server with `curl -vv http://127.0.0.1:8027` or if you're using the awesome [HTTPie](https://httpie.io/docs/cli/main-features) `http -v :8027`. If you're successfully receiving a 204, we have just ~~ignored~~ processed a HTTP request with Bash ^^ ### Parsing the HTTP request The anatomy of HTTP requests and responses is standardised in [RFC 2616][rfc2616], in sections 5 and 6 respectively. UNIX systems come with powerful text-processing utilities. Bash itself has [parameter expansion][bash-expansion] features that will be useful in processing the HTTP request and we will use [jq][jq] to extract the fields we're interested in from the JSON payload. We will build our script step by step. I'll define a function to log data (in my case, it will use `systemd-cat` to write directly to the journal; you can substitute its body to adapt it to your needs) and another function to send a response. `send_response` takes two parameters: the first one is the status code followed by its description (e.g. `204 No Content`) and the second one is the response body (optionally empty). We're using `wc` to count characters in the body (we're subtracting 1 for the extra `\n` that Bash sends to `wc`). ```bash #!/bin/bash set -euo pipefail # Good practice! function log { systemd-cat -t 'GitHub webhook handler' $@ } function send_response { printf "Sending response: %s\n" "$1" | log -p info printf "HTTP/1.1 %s\r\n" "$1" printf "Server: TauroServer CI/CD\r\n" printf "Connection: close\r\n" printf "Content-Type: application/octet-stream\r\n" printf "Content-Length: $(($(wc -c <<<"$2") - 1))\r\n" printf "\r\n" printf '%s' "$2" } ``` If you add `send_response "200 OK" "Hello World!"` as a last line, you should be able to get a 200 response with cURL or HTTPie! You can also test webhooks from the GitHub web UI and see if an OK response is received as expected. We are not sending the `Date` response headers that *should* be set according to RFC 2616. ![socat and HTTPie talking nicely to each other.](/resources/dumb-cd-webhooks/socat-httpie-success.jpg) **Parsing the request line.** As easy as `read method path version`. There will probably be a pending `\r` on `version` but we don't care this much (we will assume HTTP/1.1 everywhere ^^). **Parsing headers.** Headers follow immediately the request line. Expert bash users would probably use an associative array to store request headers; we will just use a `case` statement to extract headers we're interested in. We split each line on `:` by setting the special variable `IFS`, input field separator, that defaults to a space. `tr` removes pending `\r\n` and brace substitution is used to remove the space that follows `:` in each header line. ```bash content_length=0 delivery_id=none ghsig=none event_type=none while IFS=: read -r key val; do [[ "$key" == "$(printf '\r\n')" ]] && break; val=$(tr -d '\r\n' <<<"$val") case "$key" in "Content-Length") content_length="${val# }" ;; "X-GitHub-Delivery") delivery_id="${val# }" ;; "X-GitHub-Event") event_type="${val# }" ;; "X-Hub-Signature-256") # This will be trimmed later when comparing to OpenSSL's HMAC ghsig=$val ;; *) ;; esac printf 'Header: %s: %s\n' "$key" "$val" | log -p debug done ``` **Reading body and checking HMAC signature.** GitHub sends a hex-encoded HMAC-SHA-256 signature of the JSON body as the [`X-Hub-Signature-256` header][gh-securing-webhooks], signed with the secret chosen while creating the webhook. Without this layer of security, anyone could send a POST and trigger CD scripts, maybe making us download malicious builds. In a shell script the easiest way to calculate an HMAC is with the `openssl` command-line tool. We are using `dd` to read an exact amount of bytes from stdin and the body is passed to `openssl` with a pipe to avoid sending trailing newlines (using direct redirection, i.e., `<<<` did not work for me). Brace expansion is used to split strings. Place your WebHook secret in the `GITHUB_WEBHOOK_SECRET` variable and set `ENFORCE_HMAC` to something different from 0 (I thought disabling signature checking could be useful for debugging purposes). You can now play with cURL/HTTPie/*[insert your favourite HTTP client here]* to see if you receive 401 Unauthorized responses as expected. ```bash printf "Trying to read request content... %s bytes\n" "$content_length" | log -p info content=$(dd bs=1 count="$content_length" 2> >(log -p debug) ) mysig=$(printf '%s' "$content" | openssl dgst -sha256 -hmac $GITHUB_WEBHOOK_SECRET) if [[ "${mysig#* }" == "${ghsig#*=}" ]]; then log -p notice <<<"HMAC signatures match, proceeding further." else log -p warning <<<"HMAC signatures do not match! Request is not authenticated!" if [[ $ENFORCE_HMAC != 0 ]]; then send_response "401 Unauthorized" "Provide signature as HTTP header." log -p err <<<"Exiting now because HMAC signature enforcing is required." exit 1 fi fi ``` **Sending the HTTP response.** We will send an appropriate response to response to GitHub with the function defined earlier. ```bash if [[ "$event_type" == "none" ]]; then send_response "400 Bad Request" "Please provide event type as HTTP header." log -p err <<<"X-GitHub-Event header was not provided." exit 1 fi if [[ "$delivery_id" == "none" ]]; then send_response "400 Bad Request" "Please provide delivery ID as HTTP header." log -p err <<<"X-GitHub-Delivery header was not provided." exit 1 fi printf "GitHub Delivery ID: %s\n" "$delivery_id" | log -p info printf "GitHub Event type: %s\n" "$event_type" | log -p info case "$event_type" in "workflow_run") send_response "200 OK" "Acknowledged workflow run!" ;; *) send_response "204 No Content" "" exit 0 ;; esac ``` The "HTTP server" part of the script is complete! You can also test this by asking GitHub to resend past webhooks from the web UI. ### Parsing the JSON body and downloading artifacts **JSON body.** The JSON schema of GitHub webhooks payloads can be found on the [official GH docs][gh-webhook-payloads]. We will use [jq][jq] to parse JSON and with its `-r` flag we will print the fields we're interested on standard output, each on a separate line. Its stream can be passed to the `read` builtin with `IFS` set to `\n`. The `|| true` disjunction at the end of the command makes the script continue with the execution even if jq doesn't find some of the fields we asked it to extract (e.g., in event that signal the start of a workflow, `artifacts_url` is not present). I want to run CD workflows only on the main branch (`main`, `master`, ...), so I added a check against the variable `MAIN_BRANCH` that you can configure at the top of the script. GitHub sends `workflow_run` events even when CI workflows start, but we're only interested in running a custom action when a workflow succeeds. ```bash IFS=$'\n' read -r -d '' action branch workflow_status \ name conclusion url artifacts \ commit message author < <( jq -r '.action, .workflow_run.head_branch, .workflow_run.status, .workflow_run.name, .workflow_run.conclusion, .workflow_run.html_url, .workflow_run.artifacts_url, .workflow_run.head_commit.id, .workflow_run.head_commit.message, .workflow_run.head_commit.author.name' <<<"$content") || true printf 'Workflow run "%s" %s! See %s\n' "$name" "$workflow_status" "$url" | log -p notice printf 'Head commit SHA: %s\n' "$commit" | log -p info printf 'Head commit message: %s\n' "$message" | log -p info printf 'Commit author: %s\n' "$author" | log -p info if [[ "$action" != "completed" ]] \ || [[ "$conclusion" != "success" ]] \ || [[ "$branch" != "$MAIN_BRANCH" ]]; then exit 0 fi log -p notice <<<"Proceeding with continuous delivery!" ``` **Build artifacts.** Before running the custom CD script that depends from your specific deployments scenario, we will download all artifacts build on GitHub Actions during CI. For example, in our Dart/Flutter webapp this could include the built website, with Dart already compiled to JavaScript. In the case of our Python backend the artifact is a Python wheel. This webhook script handler is completely language-agnostic though, meaning that you can use it with whatever language or build system you want. We will download and extract all artifacts in a temporary directory and then pass its path as an argument to the CD script, with a bunch of other useful information such as the branch name and the commit SHA. The function `download_artifacts` downloads and extracts the ZIP files stored on GitHub using the [Artifacts API][gh-artifacts-api]. It iterates on the JSON array and extracts appropriate fields using jq's array access syntax. GitHub returns a 302 temporary redirect when it receives a GET on the `archive_download_url` advised in the artifact body, so we use cURL's `-L` to make it follow redirects. ```bash function download_artifacts { # $1: URL # $2: directory to download artifacts to pushd "$2" &>/dev/null artifacts_payload=$(curl --no-progress-meter -u "$GITHUB_API_USER:$GITHUB_API_TOKEN" "$1" 2> >(log -p debug)) artifacts_amount=$(jq -r '.total_count' <<<"$artifacts_payload") for i in $(seq 1 "$artifacts_amount"); do printf 'Downloading artifact %s/%s...\n' "$i" "$artifacts_amount" | log -p info name=$(jq -r ".artifacts[$((i - 1))].name" <<<"$artifacts_payload") url=$(jq -r ".artifacts[$((i - 1))].archive_download_url" <<<"$artifacts_payload") printf 'Artifact name: "%s" (downloading from %s)\n' "$name" "$url" | log -p info tmpfile=$(mktemp) printf 'Downloading ZIP to %s\n' "$tmpfile" | log -p debug curl --no-progress-meter -L -u "$GITHUB_API_USER:$GITHUB_API_TOKEN" --output "$tmpfile" "$url" 2> >(log -p debug) mkdir "$name" printf 'Unzipping into %s...\n' "$2/$name" | log -p debug unzip "$tmpfile" -d "$2/$name" | log -p debug rm "$tmpfile" done popd &>/dev/null } artifacts_dir=$(mktemp -d) printf 'Downloading artifacts to %s...\n' "$artifacts_dir" | log -p info download_artifacts "$artifacts" "$artifacts_dir" ``` **Final step: running your CD script!** If you aliased `log` to `systemd-cat` as I did above, the `-t` flag will select a different identifier, to make the output of the custom script stand out from the garbage of our webhook handler in the system journal. Again, configure `$CD_SCRIPT` appropriately; it will be run from the directory specified above in the systemd unit file and it will receive the path to the directory containing the downloaded artifacts as an argument. **Note**: it will run with root privileges unless specified otherwise in the service unit file! ```bash printf 'Running CD script!\n' | log -p notice $CD_SCRIPT "$artifacts_dir" "$branch" "$commit" 2>&1 | log -t "CD script" -p info ``` For example, our Python backend CD script looks something like this: ```bash cd "$1" systemctl stop my-awesome-backend.service source /var/lib/my-backend/virtualenv/bin/activate pip3 install --no-deps --force **/*.whl systemctl start my-awesome-backend.service ``` **Bonus points for cleanup** :) Remove the tmpdir created earlier to store artifacts: ```bash printf 'Deleting artifacts directory...\n' | log -p info rm -r "$artifacts_dir" ``` ## Conclusion You can find the complete script [here](/resources/dumb-cd-webhooks/handle-webhook.sh). It's 158LoC, not that much, and it's very flexible. There's room for improvement; e.g., selecting different scripts on different branches. Let me know if you extend this script or use a similar approach! [gh-webhooks]: https://docs.github.com/en/developers/webhooks-and-events/webhooks/about-webhooks [rfc2616]: https://datatracker.ietf.org/doc/html/rfc2616 [bash-expansion]: https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html [jq]: https://stedolan.github.io/jq/ [gh-securing-webhooks]: https://docs.github.com/en/developers/webhooks-and-events/webhooks/securing-your-webhooks#validating-payloads-from-github [gh-webhook-payloads]: https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads [gh-artifacts-api]: https://docs.github.com/en/rest/actions/artifacts [^overengineering]: At this point, you might be wondering whether it is worth it. It probably isn't, and the simplest solution could be a 100LoC Python + FastAPI script to handle webhooks. But I *really* wanted to do this with basic UNIX tools. [^socat]: You can also use `socat` to do what systemd is doing for us if you want to test your script on your local machine: `socat TCP-LISTEN:8027,reuseaddr,fork SYSTEM:/path/to/handle-webhook.sh`. [comment]: # vim: ts=2:sts=2:sw=2:et:nojoinspaces:tw=80