Initial README

2020-04-26 19:32:56 +02:00 · 2020-04-26 19:32:56 +02:00 · 7dfb1012b4
commit 7dfb1012b4
1 changed files with 196 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,196 @@
+# A KISS CI/CD system for the tildeverse?
+
+So, for a while i've been thinking about how to deploy a simple and permissionless Continuous Integration/Delivery system for tilde servers, with a scriptable CLI. Here's where my thoughts have got me so far:
+
+# General overview
+
+Webhooks are simple HTTP+JSON requests sent by a forge such as Gitea to a remote endpoint, to inform it some updates were performed on the repository. So, on a high-level a webhook lifetime would be:
+
+1. If the updated remote is not in the local database, exit
+2. If the HTTP signature on the webhook doesn't match the local secret, exit
+3. Run the `webhook-run` script (which is privileged), which for each user subscribing to the remote:
+  1. Finds the corresponding git-build.sh task(s)
+  2. Runs git-build.sh as the user, with the matching task(s) as arguments
+4. Enjoy!
+
+In this document, you will find information about:
+
+- `webhook`: a user-facing wrapper script
+- `webhook-backend`: a user-friendly CLI to manage your remote subscriptions, which cannot be called outside of `webhook`
+- `webhook-run`: a script the HTTP endpoint calls when a remote was found to have a legitimate update
+- `webhook-endpoint`: a script that acts as an HTTP endpoint, and validates incoming webhooks
+
+A simple CLI interface is presented in the next section. How the system works under the hood is explained in the [Architecture](#architecture) section. Current limitations and further ideas are described in the [Future](#future) section. Please note that this system is (at the moment) entirely **hypothetical** and was merely described in order to gather feedback before implementation.
+
+# CLI interface
+
+We need a user interface to subscribe to a remote, unsubscribe from a remote, update the corresponding secret:
+
+```
+user$ webhook add "https://tildegit.org/tilde-fr/infra"
+[webhook] Your secret for https://tildegit.org/tilde-fr/infra is now:
+XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX # <--- Hex-encoded /dev/urandom
+
+user$ webhook list
+https://tildegit.org/tilde-fr/infra\tXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+
+user$ webhook secret "https://tildegit.org/tilde-fr/infra" YYYYYYYYYYYYYYYY
+[webhook] Your secret for https://tildegit.org/tilde-fr/infra is now:
+YYYYYYYYYYYYYYYY
+
+user$ webhook unsubscribe "https://tildegit.org/tilde-fr/infra"
+[webhook] Unsubscribed from https://tildegit.org/tilde-fr/infra
+[webhook] Users can still subscribe to this remote. To remove it entirely, run:
+webhook remove "https://tildegit.org/tilde-fr/infra"
+```
+
+The opposite of `add` is `remove`. The opposite of `subscribe` is `unsubscribe`. Subscribe automatically implies Add, and vice-versa. Remove only means unsubscribe unless the `--force` flag is passed, but unsubscribe never implies Remove.
+
+The commands are presented in greater detail in the [webhook-backend](#webhook-backend) section.
+
+# Architecture
+
+## Introduction
+
+A very naive approach to subscriptions storage would have users manage their own database in `$HOME`. However, that would require to iterate over all homedirs on every webhook to figure out which are legitimate, which is a vector for DOS attacks, so we need another way.
+
+I propose to introduce a `webhook` unprivileged user, which manages a central database of subscriptions for the server. This user would have a `~webhook/webhooks` folder. For each remote URL `$r` (where `$rhex` is the hex-encoded representation of it), there would be in this folder:
+
+- `.$rhex.owner` is the local user owning the repository, and is therefore responsible for keeping the secret in sync with the remote
+- `.$rhex.secret` contains the secret shared with the repo
+- `$rhex.$u` for each `$u` local user subscribed to the repo
+
+Additionally, for each user `$u` owning one or more repositories, there would be a `.owned-by/$u` folder containing files named `$rhex` for each `$r` remote the user owns.
+
+## webhook (wrapper script)
+
+Because of how the system is designed, the `webhook` script needs to run as user `webhook` (as this user owns the database) but the script needs to reliably know which user called it. To run as `webhook` user, we simply set the suid bit on `/usr/local/bin/webhook` (owned by `webhook:webhook`) with `chmod u+s /usr/local/bin/webhook`.
+
+**Note**: We cannot rely on uid/ruid to provide accurate information on the user calling the script, because a simple `setresuid()` (~> `man 2 setresuid`) call would allow anyone to impersonate the `webhook` user.
+
+So, in fact, `webhook` would be a simple wrapper script for `webhook-backend`, which would be the setuid script. This `webhook` script would:
+
+1. Create a file `~/.webhook.LOCK` with the current arguments (`$@`)
+2. Call `webhook-backend` with the current arguments
+3. Remove `~/.webhook.LOCK`
+
+**Note**: A such architecture makes the system strongly synchronous. As such, it would not support running multiple commands at the same time for a single user, which i do not believe to be a problem.
+
+## webhook-backend
+
+The backend script, running as user `webhook`, will check whether the declared ruid (real user id) has a `~/.webhook.LOCK` file containing the same arguments it just received. This will prevent user impersonation.
+
+If the check succeeds, `webhook-backend` processes the arguments. The first argument is the command, and can be one of the following: `list`, `add`, `remove`, `subscribe`, `unsubscribe`, `secret`. If no argument is supplied, the `list` command is implied.
+
+Some commands take a remote argument, and some take secret arguments. The remote is abbreviated `$r`, and the secret `$s`. Additionally, `$rhex` designates `$r` hex-encoded. `$u` is the real user running the script.
+
+### list
+
+`list` command takes no further arguments. It lists the current subscriptions for the user, as well as unsubscribed remotes for which a secret is defined for when a user owns a remote (and its secret) but does not subscribe to it.
+
+For each remote found for the user, the `list` command prints with tab separation (`\t`) between those fields:
+
+- the remote URL
+- (for owners) the remote's secret
+- (for owners who unsubscribed to the remote) a literal `x`
+
+`list` iterates over `~/.webhooks/*.$u`, and the current matching file with `.$u` stripped is called `$rhex`. Foreach of those:
+
+1. Hex-decode `$rhex` into `$r`
+2. Print `$r`
+3. If `.$rhex.owner` does not contain `$u`, continue
+4. Print `\t` plus the contents of `.$rhex.secret`
+5. If `owned-by/$u/$rhex` exists, print `\tx`.
+6. Print a newline `\n`
+
+### add
+
+`add` command takes `$r` and optionally `$s`. As explained before, `$r` is hex-encoded as `$rhex`.
+
+If `~/.webhooks/.$rhex.owner` already exists, runs the `subscribe` command instead. Otherwise, `add` takes the following steps:
+
+1. Clone `$r` to `/tmp/` to ensure the repository can be reached
+2. **TODO**: Here is where we introduce additional authentication steps to prevent accidentally claiming a repository you do not own
+2. Create `.$rhex.owner` with `$u` for content
+3. Create `owned-by/$u/$rhex`
+4. If a secret `$s` was provided, write it to `.$rhex.secret`. Otherwise, generate a 32 hex characters secret `$s` and write it to the same file.
+5. Print `[webhook] Your secret for $r is now:\n$s`
+6. Run subscribe command
+
+### remove
+
+`remove` command takes a `$r` and an optional `-f|--force` flag. Unless the force flag is set, the user is warned that forcibly removing the remote they own (and the associated secret) would break stuff for other users, but that they can do it with `-f`. After this warning, the `unsubscribe` command is run instead.
+
+If the force flag is indeed passed to `remove`, the command checks whether `$u` owns the repository (by looking up `.$rhex.owner`) and errors if that is not the case.
+
+If the user is the legitimate owner of the remote, the following files are deleted:
+
+- `.owned-by/$u/$rhex`
+- `.$rhex.owner`
+- `.$rhex.secret`
+- `$rhex.*` (current subscriptions)
+
+### subscribe
+
+`subscribe` command takes `$r`. If `$rhex.$u` exists, the user is already subscribed and it does nothing, maybe print a warning?
+
+If the file `.$rhex.secret` exists, then `$rhex.$u` is created and a confirmation message printed. Otherwise, the add command is run.
+
+### unsubscribe
+
+`unsubscribe` command takes `$r`. If `$rhex.$u` doesn't exist, the user isn't subscribed, so a warning is emitted and the command does nothing. Otherwise, the file `$rhex.$u` is removed.
+
+### secret
+
+`secret` command takes `$r` and an optional `$s`. If `.$rhex.secret` does not exist, the remote was not found, so an error is printed and the command does nothing. If `$u` doesn't match the content of `.$rhex.owner`, then an error is printed and the command does nothing.
+
+If `$s` is specified, it's written to `.$rhex.secret`. Otherwise, the contents of `.$rhex.secret` are printed.
+
+## webhook-run (privileged)
+
+Once a HTTP webhook has been authenticated as legitimate by the local HTTP endpoint, it calls the `webhook-run` script (which executes as root) with the updated remote `$r` as argument.
+
+For each subscribed file in `~webhook/webhooks/$rhex.*`:
+
+1. Extract the username `$u` by removing the `~webhook/webhooks/$rhex.` prefix
+2. For each file in `~$u/.git-build/*.source`:
+  1. If it does not match the updated repository, continue
+  2. Extact task name `$t` by removing the `.source` suffix
+  3. Runs `git-build.sh $t` as user `$u` (using sudo)
+
+## webhook-endpoint (HTTP service)
+
+`webhook-endpoint` is the internet-facing service of this system, and it ensures incoming webhooks are legitimate, by following these steps:
+
+1. Decode the JSON into a native data structure `$w`, and register its `$w.repository.html_url` as `$r`, or exit silently if this fails
+2. Hex-encode `$r` into `$rhex`
+3. Check that `~webhook/webhooks/.$rhex.secret` exists, or exit silently
+4. Verify that `$w.secret` matches `~webhook/webhooks/.$rhex.secret`, or exit silently
+5. Verify that the HTTP signature in header `HTTP_X_GITEA_SIGNATURE` matches the secret, or exit silently
+6. Run `webhook-run $r`
+
+# Future
+
+Currently, this software is not implemented. I was merely gathering ideas which turned into a complete manual, which is great for when i finally implement that. However, there are current limitations in the current design i would like to tackle, and suggestions are more than welcome.
+
+## Validate remote repository ownership through a side-channel
+
+Under the current proposal, it would be possible for a user to register any unclaimed repository and claim ownership for it on the local system. This could prevent the legitimate owner from registering it themselves, and therefore would block subscription to this remote because the local secret would be invalid (as the forge would be unaware of it).
+
+In order to prevent illegitimate ownership claims, we could implement an additional verification by:
+
+1. Checking a repository's website URL `$web` through the forge API
+2. Generating a secret to challenge the user `$u`
+3. Placing the secret in `~$u/public_html/.well-known/webhook-challenge`
+4. Looking up `$web/.well-known/webhook-challenge`
+5. If this fails, inform the user to complete the challenge manually
+
+## Repository petnames
+
+Currently, repositories are archived on disk as hex-encoded URLs. This is done to prevent the URL from colliding with unwanted characters on the filesystem (such as `/`). However, this approach is not really admin-friendly because you can't simply `ls` what's in `~webhook/webhooks` to get information about the system.
+
+Maybe a petname system would be more appropriate.
+
+## User ownership
+
+Currently, the whole database is understood to be owned by `webhook` user. Maybe this can be reconsidered in the future so that users can script their subscriptions without using the `webhook` wrapper script. However, i do not believe this is important.