11 KiB

Raw Permalink Blame History

Edwin, a Gemini server in POSIX AWK (with some other bits)

Foreword
Requirements
References
Architecture
Config
Request
- Gemini spec
- URL parsing
Response
TLS

Foreword

Sort of on a whim, and sort of because the only programming languages I'm really comfortable in are shell and awk, I've decided to write a Gemini server in those languages. I've already written a Gemini browser in bash (or written most of one; I still need to add some bits for quality-of-life improvements) for much of the same reasons, so I always knew that there would come a day when I'd need to write something for the other end of the pipe. It turns out, today is that day.

What follows is a literate Org file containing a functioning Gemini server that's as POSIX-compatible as possible. Awk handles the textual parts of the request and response, but since it can't do networking (and even GNU awk can't do TLS), I'm wrapping that core logic in a call to socat in a shell script. A dream of mine is to shoehorn Make in as a multiplexer, but I'm not sure if it's possible or even necessary. Let's find out!!

Requirements

POSIX awk
POSIX sh
socat
probably a Unix environment

References

Gemini Specification (HTTP)
POSIX Awk manual (HTTP)
POSIX Sh manual (HTTP)
Socat manual (HTTP)

Architecture

Edwin is made of a few different layers that all interact with each other. Awk is going to handle the actual input-output bit of the request, since it's good with that. It'll have 2 rules – one to handle gemini links and one for everything else – and due to the nature of Gemini connections, it'll exit after reading one line. Since that's not a great way to run a server, and since awk doesn't handle TLS or networking (GNU awk does, but that's (a) hacky and nonstandard and, I don't know, weird, and (b) it still doesn't do TLS, so I'd be shelling it up anyway), I'm wrapping the awk script in a shell script using socat to pipe between TLS and the awk process.

Under the awk layer, we'll have our CGI layer – CGI scripts will respond to requests themselves, so they can do things like ask for input or use client certificates.

Config

TODO Awk layer

`DEFAULT_MIME`

`BASE_DIR`

`HOSTNAME`

TODO Shell layer

TODO CGI layer

Request

Gemini spec

Gemini requests are a single CRLF-terminated line with the following structure:

<URL><CR><LF>

<URL> is a UTF-8 encoded absolute URL, of maximum length 1024 bytes. If the scheme of the URL is not specified, a scheme of gemini:// is implied.

Sending an absolute URL instead of only a path or selector is effectively equivalent to building in a HTTP "Host" header. It permits virtual hosting of multiple Gemini domains on the same IP address. It also allows servers to optionally act as proxies. Including schemes other than gemini:// in requests allows servers to optionally act as protocol-translating gateways to e.g. fetch gopher resources over Gemini. Proxying is optional and the vast majority of servers are expected to only respond to requests for resources at their own domain(s).

URL parsing

#+NAME function_usplit

  function usplit(url, uarr) {
       # scheme - scheme:
       if (match(url, /^[^:\/\?#]+:/)) {
            uarr["scheme"] = substr(url, RSTART, RLENGTH - 1);
            url = substr(url, RSTART + RLENGTH);
       }
       # authority - //authority
       if (match(url, /^\/\/[^\/\?#]*/)) {
            uarr["authority"] = substr(url, RSTART+2, RLENGTH-2);
            url = substr(url, RSTART + RLENGTH);
       }
       # path - path
       if (match(url, /^[^\?#]*/)) {
            uarr["path"] = substr(url, RSTART, RLENGTH);
            url = substr(url, RSTART + RLENGTH);
       }
       # query - ?query
       if (match(url, /^\?[^#]*/)) {
            uarr["query"] = substr(url, RSTART+1, RLENGTH-1);
            url = substr(url, RSTART + RLENGTH);
       }
       # fragment - #fragment
       if (match(url, /^#.*/)) {
            uarr["fragment"] = substr(url, RSTART+1);
            url = substr(url, RSTART + RLENGTH);
       }
       # sanity checks
       if (!uarr["path"]) uarr["path"] = "/";
  }

Response

Gemini spec

Gemini response headers look like this:

<STATUS><SPACE><META><CR><LF>

<STATUS> is a two-digit numeric status code, as described below in 3.2 and in Appendix 1.

<SPACE> is a single space character, i.e. the byte 0x20.

<META> is a UTF-8 encoded string of maximum length 1024 bytes, whose meaning is <STATUS> dependent.

<STATUS> and <META> are separated by a single space character.

If <STATUS> does not belong to the "SUCCESS" range of codes, then the server MUST close the connection after sending the header and MUST NOT send a response body.

If a server sends a <STATUS> which is not a two-digit number or a <META which exceeds 1024 bytes in length, the client SHOULD close the connection and disregard the response header, informing the user of an error.

Status codes

Edwin is going to be "fancy," meaning it'll use the whole gamut of 2-digit codes. They are as follows:

Code	Meaning	Layer
10	INPUT	cgi
11	SENSITIVE INPUT	cgi
20	SUCCESS	awk
30	REDIRECT - TEMPORARY	file?
31	REDIRECT - PERMANENT	file?
40	TEMPORARY FAILURE	–
41	SERVER UNAVALIABLE	sh
42	CGI ERROR	awk
43	PROXY ERROR	???
44	SLOW DOWN	sh?
50	PERMANENT FAILURE	–
51	NOT FOUND	awk
52	GONE	file?
53	PROXY REQUEST REFUSED	awk
59	BAD REQUEST	awk
60	CLIENT CERTIFICATE REQUIRED	cgi
61	CERTIFICATE NOT AUTHORISED	cgi
62	CERTIFICATE NOT VALID	cgi

The 10 codes really only make sense in the context of CGI scripts, so they can handle those themselves. Ditto for the 60 s.

20 is the default, and works as long as the awk script can find the file or CGI script and can read/execute it. So awk can handle that.

I'm thinking the 30 codes can be implemented on a file level, possibly with something as simple as /some/path/redirect.31 with a single line, gemini://example.com/some/other/path/ that edwin could read and send the client over there. Of course, the client would only have to request /some/path/redirect to be redirected. Another option for these is using something like a .molly or .htaccess file.

41 only really applies if the shellscript can't call the awk script. Likewise, 42 only makes sense in the awk layer, since that's what calls the CGI. 43 doesn't make sense unless we're planning on proxying to other hosts, which I'm not right now, so. 44 needs to be in the sh layer, if it's anywhere at all – I'm not sure that I'll implement it.

51 will be implemented in the awk layer, since it tries to find the file. For my purposes, I don't see a meaningful difference between 51 and 52, so I won't implement it; however, 52 might be usable at a file level à la 31-style file extensions – i.e., move the to-be-deleted file to delete.52, and after waiting an "appropriate" amount of time, fully deleting it – but that seems complicated and not-overly-helpful.

53 will be handled by the awk layer, since only awk will see what the request is. Same with 59.

Generate the status code

#+NAME function_respond

  function respond(code, meta) {
      printf "%s %s\r\n", code, meta
  }

Serve things

Check permissions

Edwin can't serve a file that doesn't exist, of course. expect() is the function that deals with that and other possiblities. Luckily, awk has a system() function that works like POSIX's system() call, which "shells out" to a shell – meaning we can run Unix commands like test.

A caveat: because of the conventions of Unix, we need to negate our ideas of success and failure in awk. That's why I exit the script on the true branch of the if block below – test exits with a 1 if it fails.

#+NAME function_expect

  function test(file, test_arg, err_code, err_text) {
      if (system("test -" test " " file)) {
          if (err_code && err_text)
              respond(err_code, err_text);
          return 0;
      }
      return 1;
  }

TODO Mime types

To serve a file, we need to know its mime-type so we can pass that on to the client. One day, I'll figure out something fancy with /etc/mime.types or something, but for now, we'll assume everything is DEFAULT_MIME, which in edwin's case is text/gemini.

  function get_mime(file) {
      return DEFAULT_MIME;
  }

Serve files

Here is the main "heart" of edwin, the whole reason we're here: we're serving a file. I'm not sure when we'd pass the mime-type in, but hey, it's there in case we do; at any rate, if it's not there we're going to find the mime type through the get_mime() function. After that, it's simple: print out the response header, then read the file line by line and pass it through. Finally, close the file and exit this iteration of the awk bit.

#+NAME function_serve_file

  function serve_file(path, mime) {
      if (!mime)
          mime = get_mime(path);

      respond(20, mime);

      while (getline < path) {
          print;
      }

      close(path);
      exit 0;
  }

TODO Serve CGI

Respond to requests

  {
      # clean out the URL array
      for (part in url)
          delete url[part];

      # and reassign it
      usplit($0, url);

      # sanity checks
      if (url["scheme"] != "gemini") {
          respond(53, "Only gemini supported.");
          exit 53;
      }

      if (url["authority"] != HOST_NAME) {
          response(53, "No proxying to other hosts!");
          exit 53;
      }

      # figure out the file we're serving
      path = BASE_PATH url["path"];

      # is the file executable? serve cgi
      if (test(path, "x"))
          serve_cgi(path);

      # if not, does the file exist at least? serve the file.
      if (test(path, "r", 51, "Not found."))
          serve_file(path);
      else
          exit 51;

    exit 0;
  }

11 KiB Raw Permalink Blame History Unescape Escape

Edwin, a Gemini server in POSIX AWK (with some other bits)

Foreword

Requirements

References

Architecture

Config

TODO Awk layer

DEFAULT_MIME

BASE_DIR

HOSTNAME

TODO Shell layer

TODO CGI layer

Request

Gemini spec

URL parsing

Response

Gemini spec

Status codes

Generate the status code

Serve things

Check permissions

TODO Mime types

Serve files

TODO Serve CGI

Respond to requests

TODO TLS

11 KiB

Raw Permalink Blame History

`DEFAULT_MIME`

`BASE_DIR`

`HOSTNAME`