diff --git a/edwin.org b/edwin.org index b815447..2066628 100644 --- a/edwin.org +++ b/edwin.org @@ -8,6 +8,225 @@ Sort of on a whim, and sort of because the only programming languages I'm /reall What follows is a literate Org file containing a functioning Gemini server that's as POSIX-compatible as possible. Awk handles the textual parts of the request and response, but since it can't do networking (and even GNU awk can't do TLS), I'm wrapping that core logic in a call to =socat= in a shell script. A dream of mine is to shoehorn Make in as a multiplexer, but I'm not sure if it's possible or even necessary. Let's find out!! * Requirements +- POSIX =awk= +- POSIX =sh= +- =socat= +- probably a Unix environment -* Basics +* References +- [[gemini://gemini.circumlunar.space/docs/specification.gmi][Gemini Specification]] [[https://gemini.circumlunar.space/docs/specification.html][(HTTP)]] +- POSIX Awk manual [[https://tilde.team/~ben/cgi-bin/man.sh?m=1p+awk][(HTTP)]] +- POSIX Sh manual [[https://tilde.team/~ben/cgi-bin/man.sh?m=1p+sh][(HTTP)]] +- Socat manual [[https://tilde.team/~ben/cgi-bin/man.sh?m=socat][(HTTP)]] + +* Architecture +Edwin is made of a few different layers that all interact with each other. Awk is going to handle the actual input-output bit of the request, since it's good with that. It'll have 2 rules -- one to handle gemini links and one for everything else -- and due to the nature of Gemini connections, it'll exit after reading one line. Since that's not a great way to run a server, and since awk doesn't handle TLS or networking (GNU awk does, but that's (a) hacky and nonstandard and, I don't know, /weird/, and (b) it /still/ doesn't do TLS, so I'd be shelling it up anyway), I'm wrapping the awk script in a shell script using socat to pipe between TLS and the awk process. + +/Under/ the awk layer, we'll have our CGI layer -- CGI scripts will respond to requests themselves, so they can do things like ask for input or use client certificates. + +* Config +** TODO Awk layer +*** ~DEFAULT_MIME~ +*** ~BASE_DIR~ +*** ~HOSTNAME~ +** TODO Shell layer +** TODO CGI layer + +* Request +** Gemini spec +#+begin_quote +Gemini requests are a single CRLF-terminated line with the following structure: + +== + +== is a UTF-8 encoded absolute URL, of maximum length 1024 bytes. If the scheme of the URL is not specified, a scheme of =gemini://= is implied. + +Sending an absolute URL instead of only a path or selector is effectively equivalent to building in a HTTP "Host" header. It permits virtual hosting of multiple Gemini domains on the same IP address. It also allows servers to optionally act as proxies. Including schemes other than =gemini://= in requests allows servers to optionally act as protocol-translating gateways to e.g. fetch gopher resources over Gemini. Proxying is optional and the vast majority of servers are expected to only respond to requests for resources at their own domain(s). + +#+end_quote ** URL parsing +#+NAME function_usplit +#+begin_src awk :tangle edwin.awk + function usplit(url, uarr) { + # scheme - scheme: + if (match(url, /^[^:\/\?#]+:/)) { + uarr["scheme"] = substr(url, RSTART, RLENGTH - 1); + url = substr(url, RSTART + RLENGTH); + } + # authority - //authority + if (match(url, /^\/\/[^\/\?#]*/)) { + uarr["authority"] = substr(url, RSTART+2, RLENGTH-2); + url = substr(url, RSTART + RLENGTH); + } + # path - path + if (match(url, /^[^\?#]*/)) { + uarr["path"] = substr(url, RSTART, RLENGTH); + url = substr(url, RSTART + RLENGTH); + } + # query - ?query + if (match(url, /^\?[^#]*/)) { + uarr["query"] = substr(url, RSTART+1, RLENGTH-1); + url = substr(url, RSTART + RLENGTH); + } + # fragment - #fragment + if (match(url, /^#.*/)) { + uarr["fragment"] = substr(url, RSTART+1); + url = substr(url, RSTART + RLENGTH); + } + # sanity checks + if (!uarr["path"]) uarr["path"] = "/"; + } +#+end_src + +* Response +** Gemini spec +#+begin_quote +Gemini response headers look like this: + +== + +== is a two-digit numeric status code, as described below in 3.2 and in Appendix 1. + +== is a single space character, i.e. the byte 0x20. + +== is a UTF-8 encoded string of maximum length 1024 bytes, whose meaning is == dependent. + +== and == are separated by a single space character. + +If == does not belong to the "SUCCESS" range of codes, then the server MUST close the connection after sending the header and MUST NOT send a response body. + +If a server sends a == which is not a two-digit number or a =