Adds Lazarus/omarpolo to the style sample

Merge branch 'dev'
Merge branch 'master' into dev
2023-08-01 22:31:03 +02:00 · 2023-08-01 22:24:46 +02:00 · 2023-08-01 22:21:49 +02:00 · 2023-08-01 22:14:29 +02:00 · 2023-08-01 22:04:12 +02:00 · 2023-08-01 22:02:03 +02:00
78 changed files with 5619 additions and 311 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1 @@
+.phpunit*
--- a/CHANGELOG.gmi
+++ b/CHANGELOG.gmi
@ -0,0 +1,74 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+
+=> https://keepachangelog.com/en/1.0.0/ The format is based on keep a Changelog.
+=> https://semver.org/spec/v2.0.0.html And this project adheres to Semantic Versioning.
+
+## [1.5.1] - 2023-08-01
+* Adds style from Omar Polo's site, thanks to <lazarus at sdfeu.org>
+* Fix getCss() calls, thanks to <lazarus at sdfeu.org>
+* Fixes a bug on CSS when hosted on filesystem root
+* Fix typo in the English documentation about Apache
+
+## [1.5.0] - 2022-08-23
+* Adds the Lagrange style, thanks to Eric <ortie10 at gmx.fr>
+* Adds circumlunar css
+* Removes the page-specific CSS
+* Rewrites the documentation
+
+## [1.4.1] - 2022-08-02
+* Adds link to /htmgem on the icon of the menu
+* Fixes bug about CSS not applied correctly
+* Fixes a bug about null by ref variable
+
+## [1.4.0] - 2021-04-11
+* Adds the breadcrumbs at the top and the bottom of the page.
+* Adds the text icon H͜͡m.
+* Opens the external addresses in a new window/tab.
+* Changes details in the 404 page.
+* Manages UTF-8, UTF-16 and UTF-32 entry format.
+* FIX: adds alt text of preformated texts.
+* Enables to move and rename /htmgem.
+* Allows to always run without the URL rewriting.
+* Many code refactorings.
+
+## [1.3.0] - 2021-03-29
+* Enables browsing without URL Rewriting
+* Unit testing
+* Adds the BNF definition
+* Rewriting of the French documentation
+* Translation to English
+* Adds debug.css
+* Adds index.htm in case of Php not activated
+
+## [1.2.0] - 2021-03-19
+* Removes "^" to disable text decoration line-wise.
+* CSS is no longer incorporated in the HTML page.
+* Perform sanity checks against unauthorized file access.
+* Properly close tags when the page exists in a non-null mode.
+* Split HTML generation in two: parsing and translating.
+* Create classes to handle gemtext parsing and translating.
+* Create class to generate back gemtext (for future test cases).
+* Fix: 404 doesn't occur for an empty file.
+* Page 404 fully generated by HtmGem itself.
+
+## [1.1.0] - 2021-03-14
+* File download when using "source" as a style.
+* Improves the regex.
+* Fixes 404 page text decoration, adds reload message.
+* Links to download htmgem-master.zip.
+* Links CHANGELOG and COPYING into index.gmi.
+* Styles improvement, creation of raw.css.
+* Rewording of texts.
+
+## [1.0.0] - 2021-03-10
+* Improves presentation and installation page.
+* Adds stylesheets and download of pages source code.
+* Allows to change stylesheet in the URL.
+* Tested successfully on a shared host.
+
+## [0.2.0] - 2021-03-06
+Beta version
+
+## [0.1.0] - 2021-03-01
+Alpha version
--- a/COPYING.gmi
+++ b/COPYING.gmi
@ -1,3 +1,4 @@
+^^^
                    GNU AFFERO GENERAL PUBLIC LICENSE
                       Version 3, 19 November 2007

@ -51,7 +52,8 @@ code of the modified version.
 published by Affero, was designed to accomplish similar goals.  This is
 a different license, not a version of the Affero GPL, but Affero has
 released a new version of the Affero GPL which permits relicensing under
-this license.
+t//his license.
+

  The precise terms and conditions for copying, distribution and
 modification follow.
--- a/README.md
+++ b/README.md
@ -1,21 +1,19 @@
 # HtmGem

-This program aims to provide access to Gemini pages through a web server.
+HtmGem makes your **Gemini** pages reachable on the web. It can be used on a shared host.

-It’s in alpha: advanced features available soon.
+You can see a demo on the main page of HtmGem:
+
+=> https://gmi.sbgodin.fr/htmgem

 ## Usage

-Place "htmgem.php" on the root of your webserver.
+* Copy the directory `htmgem` at the root of the website.
+* Access to the directory and follow the instructions.

-Your "page.gmi" is reachable using [http://thesite/htmgem.php?directory/page.gmi] with HTML markup:
+## Requirements

-## URL Rewriting
-
-With Nginx, you can use:
-
-```
-rewrite ^(.+\.gmi)$ /htmgem.php?url=$1 last;
-```
-
-So the page is available at [http://thesite/htmgem.php/directory/page.gmi].
+* Php v7.3
+* `Php-mbstring` module to deal with unicode characters
+* A web server (well testing with Apache and Nginx)
+* `mod-rewrite` to intercept the Gemini files
--- a/10
+++ b/10
@ -1,10 +0,0 @@
-* manage url encoding: The filename fetched on disk may differ from that was asked by URL.
-* check /etc/passwd not accessible: Perform sanity checks against unauthorized access.
-* manage 404: Display better errors.
-* alt texts on pre and quote?
-* a way to get the source of a page, using urlrewriting
-* HTML caching: Nginx tries the html, if not found use this script to build it
-* any error on one read line logs and goes to the next line, resetting modes
-* configuration: Fetch configuration in current dir, tries parents.
-* css: file location or in-place or in config?
-* Use first h1 as the HTML page title on get in config?
--- a/css/default/base.css
+++ b/css/default/base.css
@ -0,0 +1,75 @@
+html {
+    font-family: sans-serif;
+}
+
+#gmi {
+    max-width: 1024px;
+    margin: auto;
+    margin-top: 0.5em;
+    margin-bottom: 2em;
+}
+
+h1, h2, h3, blockquote, p, pre, li, ul {
+    margin: 0 0 0.3rem;
+    padding: 0;
+}
+
+h1 { font-size: 2rem; }
+h2 { font-size: 1.6rem; }
+h3 { font-size: 1.2rem; }
+
+blockquote {
+    margin-left: 3rem;
+    padding-left: 3px;
+    margin-right: 3rem;
+}
+
+pre {
+    overflow-x: auto;
+    font-size: 1rem;
+}
+
+a {
+    text-decoration: none;
+}
+
+.menu:nth-of-type(1) .menu-line {
+    text-align: left;
+}
+.menu:nth-of-type(3) .menu-line {
+    text-align: right;
+}
+
+.menu a, .menu a:visited {
+    #color: #888;
+}
+.menu a:hover {
+    #color: #000;
+}
+.menu hr {
+    border: 1px solid lightgrey;
+}
+
+@media only screen and (max-width: 1024px) {
+    body {
+        margin: 0.5rem 3rem;
+    }
+    h1 {
+        font-size: 4rem;
+    }
+    h2 {
+        font-size: 3.5rem;
+    }
+    h3 {
+        font-size: 3rem;
+    }
+    p, pre, ul, blockquote {
+        font-size: 2.6rem;
+    }
+    .menu {
+        font-size: 2rem;
+    }
+    .menu hr {
+        border: 1px solid gray;
+    }
+}
--- a/css/default/black_wide.css
+++ b/css/default/black_wide.css
@ -0,0 +1,40 @@
+@import "base.css";
+
+html {
+    background-color:#000;
+}
+
+body {
+    max-width: none;
+    margin: 0.5em 5em 2em 5em;
+}
+
+p, ul {
+    color: #ccc;
+}
+
+h1, h2, h3 {
+    color: #ddd;
+}
+
+blockquote {
+    background-color: #222;
+    border-left: 3px solid #444;
+    margin: 1rem -1rem 1rem calc(-1rem - 3px);
+    padding: 1rem;
+}
+
+a {
+    color:#ddd;
+}
+
+a:visited {
+    color: #888;
+}
+
+pre {
+    color: #ccc;
+    scrollbar-color: #222 #000;
+    background-color: #222;
+}
+
--- a/css/default/circumlunar.css
+++ b/css/default/circumlunar.css
@ -0,0 +1,144 @@
+/* Copied from https://gemini.circumlunar.space 2022-08-01 */
+
+html {
+	font-family: sans-serif;
+	font-size:16px;
+	line-height:1.6;
+	color:#1E4147;
+	background-color:#AAC789;
+}
+
+body {
+	max-width: 920px;
+	margin: 0 auto;
+	padding: 1rem 2rem;
+}
+
+h1,h2,h3{
+	line-height:1.2;
+}
+
+h1 {
+	text-align: center;
+	margin-bottom: 1em;
+}
+
+blockquote {
+	background-color: #eee;
+	border-left: 3px solid #444;
+	margin: 1rem -1rem 1rem calc(-1rem - 3px);
+	padding: 1rem;
+}
+
+ul {
+	margin-left: 0;
+	padding: 0;
+}
+
+li {
+	padding: 0;
+}
+
+li:not(:last-child) {
+	margin-bottom: 0.5rem;
+}
+
+a {
+	position: relative;
+	color:#AA2E00;
+}
+
+a:visited {
+	color: #802200;
+}
+
+/*
+a:before {
+	content: '⇒';
+	color:#AA2E00;
+	text-decoration: none;
+	font-weight: bold;
+	position: absolute;
+	left: -1.25rem;
+}
+*/
+
+pre {
+	background-color: #eee;
+	margin: 0 -1rem;
+	padding: 1rem;
+	overflow-x: auto;
+}
+
+details:not([open]) summary,
+details:not([open]) summary a {
+	color: gray;
+}
+
+details summary a:before {
+	display: none;
+}
+
+dl dt {
+	font-weight: bold;
+}
+
+dl dt:not(:first-child) {
+	margin-top: 0.5rem;
+}
+
+@media(prefers-color-scheme:dark) {
+	html {
+		background-color: #111;
+		color: #eee;
+	}
+
+	blockquote {
+		background-color: #000;
+	}
+
+	pre {
+		background-color: #222;
+	}
+	
+	a {
+		color: #0087BD;
+	}
+
+	a:visited {
+		color: #802200;
+	}
+}
+
+label {
+	display: block;
+	font-weight: bold;
+	margin-bottom: 0.5rem;
+}
+
+input {
+	display: block;
+	border: 1px solid #888;
+	padding: .375rem;
+	line-height: 1.25rem;
+	transition: border-color .15s ease-in-out,box-shadow .15s ease-in-out;
+	width: 100%;
+}
+
+input:focus {
+	outline: 0;
+	border-color: #80bdff;
+	box-shadow: 0 0 0 0.2rem rgba(0,123,255,.25);
+}
+
+
+/* Additions for HtmGem */
+
+.menu a {
+    text-decoration: none;
+}
+
+.menu hr {
+    color: red;
+}
+
--- a/css/default/debug.css
+++ b/css/default/debug.css
@ -0,0 +1,5 @@
+@import "base.css";
+
+h1, h2, h3, p, li, pre, blockquote {
+    border: 1px solid lightblue;
+}
--- a/css/default/htmgem.css
+++ b/css/default/htmgem.css
@ -0,0 +1,90 @@
+@import "base.css";
+
+html {
+    color:#1E4147;
+    background-color:#fafafa;
+}
+
+
+h1, h2, h3 {
+    color: #66f;
+}
+
+blockquote {
+    background-color: #eee;
+    border-left: 3px solid #444;
+    margin: 1rem -1rem 1rem calc(-1rem - 3px);
+    padding: 1rem;
+}
+
+.menu a, .menu a:visited {
+    color: #888;
+}
+
+.menu a:hover {
+    color: #000;
+}
+
+.menu a.logo {
+    color: #000;
+}
+
+.menu a.logo:hover {
+    color: blue;
+}
+
+.menu hr {
+    color: white;
+}
+
+#gmi a {
+    margin: -0.7rem;
+    color:#820;
+}
+
+#gmi a:before {
+    content: "🔗 ";
+}
+
+#gmi a:visited {
+    color: #868;
+}
+
+#gmi a.local:before {
+    content: "🛩️ ";
+    font-weight: bold;
+}
+
+#gmi a.gemini:before {
+    content: "🚀 ";
+}
+
+#gmi a.gopher:before {
+    content: "📜 ";
+}
+
+#gmi a.https:before {
+    content: "🕸️ ";
+    font-weight: bolder;
+}
+
+#gmi a.http:before {
+    content: "🕸️ ";
+    font-weight: lighter;
+}
+
+#gmi a.mumble:before {
+    content: "🎤 ";
+}
+
+#gmi a.mailto:before {
+    content: "✉️ ";
+}
+
+@media only screen and (max-width: 1024px) {
+    
+    #gmi a {
+        margin: -2.9rem;
+    }
+}
+
--- a/css/default/raw.css
+++ b/css/default/raw.css
@ -0,0 +1,39 @@
+@import "base.css";
+
+body {
+    background-color: white;
+    color: black;
+    font-family: mono;
+}
+
+p, h1, h2, h3, ul, li, pre, blockquote {
+    color: black;
+    margin: 0;
+    padding: 0;
+    font-weight: normal;
+}
+
+ul { list-style: none; }
+li:before { content: "* ";   }
+h1:before { content: "# ";   }
+h2:before { content: "## ";  }
+h3:before { content: "### "; }
+
+blockquote :before { content: "> "; } 
+
+pre {
+    scrollbar-color: lightgrey white;
+    overflow-x: auto;
+}
+
+a {
+    text-decoration: none;
+}
+
+a, a:visited {
+    color: black;
+}
+
+a:before {
+    content: "=> ";
+}
--- a/css/default/simple.css
+++ b/css/default/simple.css
@ -0,0 +1,9 @@
+@import "base.css";
+
+p, h1, h2, h3, ul, li, pre, blockquote {
+    margin: 0;
+    padding: 0;
+    font-weight: normal;
+}
+
+
--- a/css/default/terminal.css
+++ b/css/default/terminal.css
@ -0,0 +1,83 @@
+@import "base.css";
+
+html {
+    font-family: mono;
+    color: #080;
+    background-color: #000;
+}
+
+body {
+    max-width: 76em;
+}
+
+h1, h2, h3 {
+    color: #0b0;
+}
+
+blockquote {
+    background-color: #010;
+    border-left: 3px solid #444;
+    border-color: #0b0;
+    margin: 1rem -1rem 1rem calc(-1rem - 3px);
+    padding: 1rem;
+}
+
+pre {
+    scrollbar-color: #030 #010;
+    background-color: #010;
+    margin: 0 -1rem;
+    padding: 1rem;
+}
+
+.menu a {
+    margin: 0;
+}
+
+.menu a:before {
+    content: "";
+}
+
+a {
+    margin: -1.35rem;
+    color: #090;
+    font-weight: bold;
+}
+
+a:before {
+    content: "A ";
+}
+
+a:visited {
+    color: #050;
+}
+
+a.local:before {
+    content: "L ";
+    font-weight: bold;
+}
+
+a.gemini:before {
+    content: "G ";
+}
+
+a.gopher:before {
+    content: "g ";
+}
+
+a.https:before {
+    content: "W ";
+    font-weight: bolder;
+}
+
+a.http:before {
+    content: "w ";
+    font-weight: lighter;
+}
+
+a.mumble:before {
+    content: "U ";
+}
+
+a.mailto:before {
+    content: "M ";
+}
--- a/css/index.gmi
+++ b/css/index.gmi
@ -0,0 +1,49 @@
+# Styles
+
+=> index.gmi|default,htmgem.css default/htmgem.css
+=> index.gmi|default,base.css default/base.css
+=> index.gmi|default,circumlunar.css default/circumlunar.css
+=> index.gmi|lagrange,lagrange.css lagrange/lagrange.css
+=> index.gmi|lazarus,omarpolo.css lazarus/omarpolo.css
+=> index.gmi|default,terminal.css default/terminal.css
+=> index.gmi|default,black_wide.css default/black_wide.css
+=> index.gmi|default,simple.css default/simple.css
+=> index.gmi|default,raw.css default/raw.css
+=> index.gmi|default,debug.css default/debug.css
+=> index.gmi|src Source code
+=> index.gmi|source Download source code
+
+Lorem ipsum dolor sit amet.
+//Lorem// **ipsum** __dolor__ ~~sit amet~~.
+
+* 1 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
+* 2 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
+
+> Citation : Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
+
+```
+ ---------------------------
+< Lorem ipsum dolor sit amet >
+ ---------------------------
+        \   ^__^
+         \  (oo)\_______
+            (__)\       )\/\
+                ||----w |
+                ||     ||
+```
+
+# 1 1 1 1 1 1
+
+## 2 2 2 2 2 2
+
+### 3 3 3 3 3 3
+
+=> https://gmi.sbgodin.fr
+
+=> http://gmi.sbgodin.fr
+
+=> gemini://gmi.sbgodin.fr
+
+=> mailto:adress@foo.invalid
+
+=> mumble:adress.mumble.invalid
--- a/css/lagrange/Mada-Bold.ttf
+++ b/css/lagrange/Mada-Bold.ttf
--- a/css/lagrange/Mada-Regular.ttf
+++ b/css/lagrange/Mada-Regular.ttf
--- a/css/lagrange/Nunito-Bold.ttf
+++ b/css/lagrange/Nunito-Bold.ttf
--- a/css/lagrange/RobotoMono.ttf
+++ b/css/lagrange/RobotoMono.ttf
--- a/css/lagrange/SourceSansPro-Italic.ttf
+++ b/css/lagrange/SourceSansPro-Italic.ttf
--- a/css/lagrange/lagrange.css
+++ b/css/lagrange/lagrange.css
@ -0,0 +1,433 @@
+/*-----------------------
+
+	----------------
+	 -  Lagrange  -
+	----------------
+	
+	a template for htmGem
+	
+	based on 
+   
+    https://github.com/skyjake/lagrange
+	
+----------------------------*/
+
+
+
+ @font-face {
+     font-family: nunito;
+     font-style: normal;
+     font-weight: 400;
+     src: local('Nunito Bold'), url(Nunito-Bold.ttf) format('truetype');
+}
+ @font-face {
+     font-family: Mada;
+     font-style: normal;
+     font-weight: 400;
+     src: local('Mada'), url(Mada-Regular.ttf) format('truetype');
+}
+ @font-face {
+     font-family: Mada;
+     font-style: italic;
+     font-weight: normal;
+     src: url(SourceSansPro-Italic.ttf) format('truetype');
+}
+ @font-face {
+     font-family: Mada;
+     font-style: bold;
+     font-weight: 800;
+     src: local('Mada Bold'), url(Mada-Bold.ttf) format('truetype');
+}
+
+ @font-face {
+     font-family: mycourier;
+     font-style: normal;
+     font-weight: 400;
+     src: local('Courier'), url(courier.ttf) format('truetype');
+}
+
+
+ @font-face {
+     font-family: Roboto Mono;
+     font-style: normal;
+     font-weight: 400;
+     src: local('Roboto Mono'), url(RobotoMono.ttf) format('truetype');
+}
+
+
+
+ body {
+     font-family: "Mada", sans-serif;
+     font-weight: 400;
+     font-size: 1.5em;
+     margin-top: 0em;
+     margin-left: 0em;
+     margin-right: 0em;
+     padding-right: 0em;
+}
+ .menu-line {
+     font-size: 105%;
+     padding: 1.8em;
+}
+
+
+ #gmi {
+     max-width: 1024px;
+     margin: auto;
+     margin-top: 0.5em;
+     margin-bottom: 1.5em;
+}
+
+
+
+ .menu li {
+     display: inline-block;
+     list-style: none;
+     margin: 0 1rem;
+}
+ .menu li::before {
+     content: "";
+}
+
+ p {
+     margin-left: 2.5em;
+     margin-right: 0.5em;
+     margin-top: 0;
+     margin-bottom: 0.05em;
+}
+
+ p:empty {
+     margin-top: 0.3em;
+     margin-bottom: 0.3em;
+}
+
+
+ li, ul {
+     margin: 0 0 0.3em;
+     padding: 0;
+}
+
+ 
+ h1, h2, h3 {
+     font-family: "Nunito", sans-serif;
+     margin: 0 0 0.0em;
+     padding: 0;
+}
+
+
+
+ h1 {
+     font-size: 2.2em;
+     font-weight: 400;
+     margin-left: 0.5em;
+}
+ h2 {
+     font-size: 1.8em;
+     font-weight: 600;
+     margin-left: 0.5em;
+}
+ h2 span.par-edit {
+     visibility: hidden;
+     font-size: x-small;
+}
+ h2:hover span.par-edit {
+     visibility: visible;
+}
+ h3 {
+     font-size: 1.6em;
+     font-weight: 700;
+     margin-left: 1em;
+}
+ h4 {
+     margin-left: 2em;
+}
+ .link-icon {
+     display: inline-block;
+     width: 1.5em;
+     font-family: Symbola;
+     text-indent: 0;
+}
+ div.link {
+     text-indent: -1.5em;
+     padding-left: 1.5em;
+     margin-top: 0.15em;
+     margin-bottom: 0.15em;
+}
+ a {
+     text-decoration: none;
+     font-weight: 600;
+}
+
+ ul {
+     list-style: none;
+     margin-left: 0;
+     padding-left: 3em;
+}
+ ol {
+     margin-left: 0;
+     padding-left: 3em;
+}
+ ul li, ol li {
+     margin-top: 5pt;
+     margin-bottom: 5pt;
+}
+ ul li::before {
+     content: "•";
+     font-weight: bold;
+     display: inline-block;
+     width: 1.1em;
+     margin-left: -1.1em;
+}
+
+
+ blockquote {
+     font-style: italic;
+     font-weight: 300;
+     padding-left: 0.75em;
+     font-size: 100%;
+     font-size: 1em;
+    /*border-left: 1px solid #c38b16;
+    */
+     margin-left: 1em;
+    /* margin-right: 3em;*/
+    margin-top: -1em;
+    margin-bottom: 0em;
+}
+
+
+
+
+ blockquote:before {
+     content: '“';
+     font-weight: bold;
+     font-size: 2.6em;
+     line-height: 0.1em;
+     /*vertical-align: -0.4em;*/
+     margin-top: -2em;
+     position: relative;
+     top: 0.85em;
+     left: 0.2em;
+}
+
+ pre {
+     font-family: Roboto Mono, monospace;
+     font-size: 0.9em;
+     margin-left: 2.8em;
+     margin-bottom: 0.05em;
+     max-width: 100%;
+     overflow: auto;
+}
+
+ img {
+     max-width: 100%;
+}
+
+
+
+
+
+ .menu:nth-of-type(1) .menu-line {
+     text-align: left;
+}
+ .menu:nth-of-type(3) .menu-line {
+     text-align: left;
+}
+
+ .menu hr {
+     border: 0;
+}
+
+
+
+
+ #gmi a {
+     margin: -1.5em; 
+}
+ #gmi a:before {
+     content: "🔗 ";
+}
+ #gmi a:visited {
+     font-weight: normal;
+    /* doesn't work */
+}
+
+
+ #gmi a.local:before {
+     content: "➤️ ";
+     font-weight: bold;
+     font-size: 1.5em;
+}
+ #gmi a.local:visited {
+     font-weight: normal;
+}
+ #gmi a.gemini:before {
+     content: "➤️ ";
+     font-size: 1.5em;
+}
+
+ #gmi a.gopher:before {
+     content: "📜 ";
+     font-size: 1.5em;
+}
+
+
+
+ #gmi a.https:before {
+     content: "🌐 ";
+     font-weight: bolder;
+     font-size: 1.5em;
+}
+ #gmi a.http:before {
+     content: "🌐 ";
+     font-weight: lighter;
+     font-size: 1.5em;
+}
+ #gmi a.mumble:before {
+     content: "🎤 ";
+     font-size: 1.5em;
+}
+ #gmi a.mailto:before {
+     content: "✉️ ";
+     font-size: 1.5em;
+}
+
+
+/* Responsivity */
+ @media only screen and (max-width: 499px) and (orientation: portrait) {
+     body {
+         font-size: 1.2em;
+         -webkit-text-size-adjust: 100%;
+         padding-left: 0;
+         padding-right: 0;
+         margin-left: 0;
+         margin-right: 0;
+         margin-top: 0;
+    }
+     h1 {
+         font-size: 1.9em;
+         font-weight: bold;
+         -webkit-text-size-adjust: 200%;
+    }
+     h2 {
+         font-size: 1.5em;
+         font-weight: bold;
+         -webkit-text-size-adjust: 160%;
+    }
+     h3 {
+         font-size: 1.3em;
+         font-weight: bold;
+         -webkit-text-size-adjust: 140%;
+    }
+    
+     blockquote {
+      margin-left: 1.5em ;
+    }
+
+ blockquote:before {
+     position: relative;
+     top: 0.85em;
+     left: -0.3em;
+}
+
+    p {
+      margin-left: 0.7em ;
+    }
+
+     #gmi a {
+         margin: 0.3em;
+    }
+    
+}
+
+
+
+ @media (prefers-color-scheme: dark) {
+     body {
+         filter: invert(100%) hue-rotate(180deg);
+    }
+     html {
+         background-color: #111;
+    }
+}
+
+
+/* Orange / sepia Colors (for lagrange.css) */
+
+
+ html, body {
+     background: #f5ebd6;
+     color: #192715;
+}
+ blockquote, pre {
+    /*background: #ede3d0;
+    */
+     color: #d2780a;
+}
+ h1 a {
+     color: #eeeeee;
+     font-weight: 800;
+}
+ h1 {
+     color: #d2780a;
+     font-weight: 800;
+}
+ h2, h3 {
+     color: #693c05;
+}
+ a:hover {
+     color: #0a6e82;
+}
+ a {
+     color: #693c05;    
+}
+
+ ul li::before {
+     color: #503909;
+}
+ 
+ 
+ blockquote:before {
+     color: #e5b77a;
+}
+
+ .menu-line {
+     background-color: #efd9b7;
+}
+ .menu-line h1 a {
+     color: #262626;
+     font-weight: 300;
+}
+
+ .menu a, .menu a:visited {
+     #color: #888;
+}
+ .menu a:hover {
+     #color: #000;
+}
+
+ #gmi a:before {
+     color: #d2780a;
+}
+
+ #gmi a:visited {
+     color: #a25707;
+}
+
+ #gmi a:hover {
+     color: #0a6e82;
+}
+
+ #gmi a.local:before {
+     color: #0a6e82;
+}
+
+ #gmi a.gemini:before {
+     color: #0a6e82;
+}
+
+ #gmi a.https:before {
+     color: #d2780a;
+}
+ #gmi a.http:before {
+     color: #d2780a;
+}
--- a/css/lagrange/lagrange_gray.css
+++ b/css/lagrange/lagrange_gray.css
@ -0,0 +1,433 @@
+/*-----------------------
+
+	----------------
+	 -  Lagrange  -
+	----------------
+	
+	a template for htmGem
+	
+	based on 
+   
+    https://github.com/skyjake/lagrange
+	
+----------------------------*/
+
+
+
+ @font-face {
+     font-family: nunito;
+     font-style: normal;
+     font-weight: 400;
+     src: local('Nunito Bold'), url(Nunito-Bold.ttf) format('truetype');
+}
+ @font-face {
+     font-family: Mada;
+     font-style: normal;
+     font-weight: 400;
+     src: local('Mada'), url(Mada-Regular.ttf) format('truetype');
+}
+ @font-face {
+     font-family: Mada;
+     font-style: italic;
+     font-weight: normal;
+     src: url(SourceSansPro-Italic.ttf) format('truetype');
+}
+ @font-face {
+     font-family: Mada;
+     font-style: bold;
+     font-weight: 800;
+     src: local('Mada Bold'), url(Mada-Bold.ttf) format('truetype');
+}
+
+ @font-face {
+     font-family: mycourier;
+     font-style: normal;
+     font-weight: 400;
+     src: local('Courier'), url(courier.ttf) format('truetype');
+}
+
+
+ @font-face {
+     font-family: Roboto Mono;
+     font-style: normal;
+     font-weight: 400;
+     src: local('Roboto Mono'), url(RobotoMono.ttf) format('truetype');
+}
+
+
+
+ body {
+     font-family: "Mada", sans-serif;
+     font-weight: 400;
+     font-size: 1.5em;
+     margin-top: 0em;
+     margin-left: 0em;
+     margin-right: 0em;
+     padding-right: 0em;
+}
+ .menu-line {
+     font-size: 105%;
+     padding: 1.8em;
+}
+
+
+ #gmi {
+     max-width: 1024px;
+     margin: auto;
+     margin-top: 0.5em;
+     margin-bottom: 1.5em;
+}
+
+
+
+ .menu li {
+     display: inline-block;
+     list-style: none;
+     margin: 0 1rem;
+}
+ .menu li::before {
+     content: "";
+}
+
+ p {
+     margin-left: 2.5em;
+     margin-right: 0.5em;
+     margin-top: 0;
+     margin-bottom: 0.05em;
+}
+
+ p:empty {
+     margin-top: 0.3em;
+     margin-bottom: 0.3em;
+}
+
+
+ li, ul {
+     margin: 0 0 0.3em;
+     padding: 0;
+}
+
+ 
+ h1, h2, h3 {
+     font-family: "Nunito", sans-serif;
+     margin: 0 0 0.0em;
+     padding: 0;
+}
+
+
+
+ h1 {
+     font-size: 2.2em;
+     font-weight: 400;
+     margin-left: 0.5em;
+}
+ h2 {
+     font-size: 1.8em;
+     font-weight: 600;
+     margin-left: 0.5em;
+}
+ h2 span.par-edit {
+     visibility: hidden;
+     font-size: x-small;
+}
+ h2:hover span.par-edit {
+     visibility: visible;
+}
+ h3 {
+     font-size: 1.6em;
+     font-weight: 700;
+     margin-left: 1em;
+}
+ h4 {
+     margin-left: 2em;
+}
+ .link-icon {
+     display: inline-block;
+     width: 1.5em;
+     font-family: Symbola;
+     text-indent: 0;
+}
+ div.link {
+     text-indent: -1.5em;
+     padding-left: 1.5em;
+     margin-top: 0.15em;
+     margin-bottom: 0.15em;
+}
+ a {
+     text-decoration: none;
+     font-weight: 600;
+}
+
+ ul {
+     list-style: none;
+     margin-left: 0;
+     padding-left: 3em;
+}
+ ol {
+     margin-left: 0;
+     padding-left: 3em;
+}
+ ul li, ol li {
+     margin-top: 5pt;
+     margin-bottom: 5pt;
+}
+ ul li::before {
+     content: "•";
+     font-weight: bold;
+     display: inline-block;
+     width: 1.1em;
+     margin-left: -1.1em;
+}
+
+
+ blockquote {
+     font-style: italic;
+     font-weight: 300;
+     padding-left: 0.75em;
+     font-size: 100%;
+     font-size: 1em;
+    /*border-left: 1px solid #c38b16;
+    */
+     margin-left: 1em;
+    /* margin-right: 3em;*/
+    margin-top: -1em;
+    margin-bottom: 0em;
+}
+
+
+
+
+ blockquote:before {
+     content: '“';
+     font-weight: bold;
+     font-size: 2.6em;
+     line-height: 0.1em;
+     /*vertical-align: -0.4em;*/
+     margin-top: -2em;
+     position: relative;
+     top: 0.85em;
+     left: 0.2em;
+}
+
+ pre {
+     font-family: Roboto Mono, monospace;
+     font-size: 0.9em;
+     margin-left: 2.8em;
+     margin-bottom: 0.05em;
+     max-width: 100%;
+     overflow: auto;
+}
+
+ img {
+     max-width: 100%;
+}
+
+
+
+
+
+ .menu:nth-of-type(1) .menu-line {
+     text-align: left;
+}
+ .menu:nth-of-type(3) .menu-line {
+     text-align: left;
+}
+
+ .menu hr {
+     border: 0;
+}
+
+
+
+
+ #gmi a {
+     margin: -1.5em; 
+}
+ #gmi a:before {
+     content: "🔗 ";
+}
+ #gmi a:visited {
+     font-weight: normal;
+    /* doesn't work */
+}
+
+
+ #gmi a.local:before {
+     content: "➤️ ";
+     font-weight: bold;
+     font-size: 1.5em;
+}
+ #gmi a.local:visited {
+     font-weight: normal;
+}
+ #gmi a.gemini:before {
+     content: "➤️ ";
+     font-size: 1.5em;
+}
+
+ #gmi a.gopher:before {
+     content: "📜 ";
+     font-size: 1.5em;
+}
+
+
+
+ #gmi a.https:before {
+     content: "🌐 ";
+     font-weight: bolder;
+     font-size: 1.5em;
+}
+ #gmi a.http:before {
+     content: "🌐 ";
+     font-weight: lighter;
+     font-size: 1.5em;
+}
+ #gmi a.mumble:before {
+     content: "🎤 ";
+     font-size: 1.5em;
+}
+ #gmi a.mailto:before {
+     content: "✉️ ";
+     font-size: 1.5em;
+}
+
+
+/* Responsivity */
+ @media only screen and (max-width: 499px) and (orientation: portrait) {
+     body {
+         font-size: 1.2em;
+         -webkit-text-size-adjust: 100%;
+         padding-left: 0;
+         padding-right: 0;
+         margin-left: 0;
+         margin-right: 0;
+         margin-top: 0;
+    }
+     h1 {
+         font-size: 1.9em;
+         font-weight: bold;
+         -webkit-text-size-adjust: 200%;
+    }
+     h2 {
+         font-size: 1.5em;
+         font-weight: bold;
+         -webkit-text-size-adjust: 160%;
+    }
+     h3 {
+         font-size: 1.3em;
+         font-weight: bold;
+         -webkit-text-size-adjust: 140%;
+    }
+    
+     blockquote {
+      margin-left: 1.5em ;
+    }
+
+ blockquote:before {
+     position: relative;
+     top: 0.85em;
+     left: -0.3em;
+}
+
+    p {
+      margin-left: 0.7em ;
+    }
+
+     #gmi a {
+         margin: 0.3em;
+    }
+    
+}
+
+
+
+ @media (prefers-color-scheme: dark) {
+     body {
+         filter: invert(100%) hue-rotate(180deg);
+    }
+     html {
+         background-color: #111;
+    }
+}
+
+
+/* Grey Colors (for lagrange.css) */
+
+
+ html, body {
+     background: #cecece;
+     color: #343434;
+}
+ blockquote, pre {
+    /*background: #ede8de;
+    */
+     color: #6e4900;
+}
+ h1 a {
+     color: #009966;
+     font-weight: 800;
+}
+ h1 {
+     color: #009966;
+     font-weight: 800;
+}
+ h2, h3 {
+     color: #005b3d;
+}
+ a:hover {
+     color: #0a6e82;
+}
+ a {
+     color: #000000;     
+}
+
+ ul li::before {
+     color: #503909;
+}
+ 
+ 
+ blockquote:before {
+     color: #a29271;
+}
+
+ .menu-line {
+     background-color: #e0e0e0;
+}
+ .menu-line h1 a {
+     color: #005035;
+     font-weight: 300;
+}
+
+ .menu a, .menu a:visited {
+     #color: #888;
+}
+ .menu a:hover {
+     #color: #000;
+}
+
+ #gmi a:before {
+     color: #d2780a;
+}
+
+ #gmi a:visited {
+     color: #565656;
+}
+
+ #gmi a:hover {
+     color: #0a6e82;
+}
+
+ #gmi a.local:before {
+     color: #0a6e82;
+}
+
+ #gmi a.gemini:before {
+     color: #0a6e82;
+}
+
+ #gmi a.https:before {
+     color: #d2780a;
+}
+ #gmi a.http:before {
+     color: #d2780a;
+}
--- a/css/lazarus/omarpolo.css
+++ b/css/lazarus/omarpolo.css
@ -0,0 +1,123 @@
+@import "base.css";
+
+/* style from Omar Polo: https://gmid.omarpolo.com/style.css */
+body {
+  font-family: monospace;
+  font-size: 14px;
+  max-width: 780px;
+  margin: 0 auto;
+  padding: 10px;
+  padding-bottom: 80px;
+}
+
+h1::before {
+  content: "# ";
+}
+
+h2 {
+  margin-top: 40px;
+}
+
+h2::before {
+  content: "## ";
+}
+
+h3::before {
+  content: "### ";
+}
+
+#gmi a::before {
+  content: "=> ";
+}
+
+blockquote {
+  margin: 0;
+  padding: 0;
+}
+
+blockquote::before {
+  content: "> ";
+}
+
+blockquote p {
+  font-style: italic;
+  display: inline;
+}
+
+p.link::before {
+  content: "â†’ ";
+}
+
+strong::before { content: "*" }
+strong::after  { content: "*" }
+
+hr {
+  border: 0;
+  height: 1px;
+  background-color: #222;
+  width: 100%;
+  display: block;
+  margin: 2em auto;
+}
+
+ul.link-list {
+  list-style: disclosure-closed;
+}
+
+img {
+  border-radius: 5px;
+}
+
+pre {
+  overflow: auto;
+  padding: 1rem;
+  background-color: #f0f0f0;
+  border-radius: 3px;
+}
+
+pre.banner {
+  display: flex;
+  flex-direction: row;
+  justify-content: center;
+}
+
+code, kbd {
+  color: #9d109d;
+}
+
+img {
+  display: block;
+  margin: 0 auto;
+  max-width: 100%;
+}
+
+@media (prefers-color-scheme: dark) {
+  body {
+background-color: #222;
+color: white;
+  }
+
+  a {
+color: aqua;
+  }
+
+  hr {
+background-color: #ddd;
+  }
+
+  pre {
+background-color: #353535;
+  }
+
+  code, kbd {
+color: #ff4cff;
+  }
+}
+
+@media (max-width: 400px) {
+  pre.banner { font-size: 9px; }
+}
+
+@media (max-width: 500px) {
+  pre.banner { font-size: 10px; }
+}
--- a/docs/BNF.gmi
+++ b/docs/BNF.gmi
@ -0,0 +1,110 @@
+# BNF
+### aka Backus-Naur Form
+=> https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form Wikipedia / BNF
+
+The purpose of this document is to show the BNF that HtmGem uses and the way it determines the line type and what information it fetches from. For this, it uses **curly brackets** to explicitely show what information it takes. In addition to this, the **canonical form** shows how it should be displayed.
+
+———————————————————— ✀ ————————————————————
+
+textgemini = *(link / preformat / heading / ulist / quoted / plain)
+; Preformat toggle starts as 'false'.
+
+plain = **{[WSP] text} [WSP]** end-of-line
+; If preformat toggle is false, wrap text to the
+; width of the display. Otherwise, do no wrapping.
+; canonical form = {[WSP] text}
+
+preformat = "```" **[WSP]** [{alt-text}] **[WSP]** end-of-line
+; Preformat toggle set to opposite state:
+; false goes to true
+; true goes to false
+; While in preformat toggle is true, link, heading,
+; ulist and quoted lines are NOT interpreted, but
+; displayed as is to the user.
+; canonical form = ``` [SP {alt-text}]
+
+link = '=>' **[WSP]** [{URI-reference}] [**[WSP]** {text}] end-of-line
+; canonical form = '=>' SP {uri-reference} [SP {text}]
+
+heading = '#' **[WSP]** [{text}] **[WSP]** end-of-line
+/ '##' **[WSP]** {text} **[WSP]** end-of-line
+/ '###' **[WSP]** {text} **[WSP]** end-of-line
+; canonical form = ('#' / '##' / '###') [SP {text}]
+
+ulist = '*' **[WSP]** {text} **[WSP]** end-of-line
+; canonical form = '*' SP {text}
+
+quoted = '>' **[WSP]** {text} **[WSP]** end-of-line
+; canonical form = '>' SP {text}
+
+alt-text = text
+text = ***UVCHAR**
+end-of-line = [CR] LF
+
+UVCHAR = VCHAR / UTF8-2v / UTF8-3 / UTF8-4
+UTF8-2v = %xC2 %xA0-BF UTF8-tail ; no C1 control set
+/ %xC3-DF UTF8-tail
+
+; CRLF from RFC-5234
+; DIGIT from RFC-5234
+; SP from RFC-5234
+; VCHAR from RFC-5234
+; OCTET from RFC-5234
+; WSP from RFC-5234
+;
+; UTF8-3 from RFC-3629
+; UTF8-4 from RFC-3629
+; UTF8-tail from RFC-3629
+
+———————————————————— ✀ ————————————————————
+
+This BNF was taken from the working group and adapted to HtmGem's implementation.
+=> https://gitlab.com/gemini-specification/gemini-text/-/issues/7 Gitlab / Original ticket
+
+Changes:
+* white space management
+* capture of text {}
+* canonical form
+
+The white spaces that end a line are never used. See the definition of //text// in the BNF, which no longer contains //SP//.
+
+# Examples
+
+## => links
+
+### Normal link
+> link = '=>' [WSP] [{URI-reference}] [[WSP] {text}] end-of-line
+> ; canonical form = '=>' SP {uri-reference} [SP {text}]
+
+Source: => foo.invalid text of the link
+Data: '=>' {'foo.invalid'} {'text of the link'}
+Canonical: => foo.invalid text of the link
+Html: <a href="foo.invalid">text of the link</a>
+
+Source: =>foo.invalid           text of the link
+Data: '=>' {'foo.invalid'} {'text of the link'}
+Canonical: => foo.invalid text of the link
+Html: <a href="foo.invalid">text of the link</a>
+
+Source: => just_a_page
+Data: '=>' {'just_a_page'} {}
+Canonical: => just_a_page
+Html: <a href="just_a_page">just_a_page</a>
+
+Source: =>
+Data: '=>' {''} {''}
+Canonical: =>
+Html: <a href="">&nbsp;</a>
+
+## # Headings
+> heading = '#' **[WSP]** [{text}] **[WSP]** end-of-line
+> (…)
+> ; canonical form = ('#' / '##' / '###') [SP {text}]
+
+Source: #title
+Data: '#' {"title"}
+Canonical: # title
+
+Source: # title with two spaces between title and with
+Data: '#' {"title  with spaces"}
+Canonical: # title with two spaces between title and with
--- a/docs/index.gmi
+++ b/docs/index.gmi
@ -0,0 +1,14 @@
+# Documentation
+
+=> https://gmi.sbgodin.fr/htmgem Project main page
+=> https://tildegit.org/Sbgodin/htmgem Source code, comments
+=> https://gemini.circumlunar.space Main Gemini capsule
+
+=> ../CHANGELOG.gmi Change log
+=> ../COPYING.gmi License Gnu Affero General Public License v3 — 19 november 2007
+=> BNF.gmi BNF / Backus-Naur Form
+=> specification_Gemini_v0.16.1.gmi Specification Gemini v0.16.1 (no impact on HtmGem)
+=> specification_Gemini_v0.14.3.gmi Specification Gemini v0.14.3
+
+=> ../css Styles 😎
+=> ../tests Tests 😅
--- a/docs/installation-en.gmi
+++ b/docs/installation-en.gmi
@ -0,0 +1,50 @@
+# To install HtmGem
+
+* Download HtmGem
+=> https://tildegit.org/sbgodin/HtmGem/archive/master.zip
+* Copy the files at the root of the website.
+* Configurer la réécriture d’URL (//URL Rewriting//).
+* Write some text in /index.gmi.
+* Just open your website!
+
+* Go to /htmgem to get the documentation.
+
+### Prerequisites
+
+* Php v7.3 minimum
+* Module **Php-mbstring** to handle l’unicode
+* A web server (Apache and Nginx supported)
+* Module **mod-rewrite** for the URL rewriting
+
+### Nginx
+```
+index index.gmi index.php index.html
+rewrite ^(.+\.gmi)$ /htmgem/index.php?rw=1&url=$1&style=default,htmgem.css;
+error_page 403 /htmgem;
+location = /favicon.ico { alias /var/www/dev/htmgem/favicon.ico; }
+```
+
+### Apache
+```
+DirectoryIndex index.gmi index.php index.html
+RewriteEngine on
+RewriteRule ^(.+\.gmi)$ htmgem/index.php?rw=1&url=$1&style=default,htmgem.css
+```
+
+Other available styles:
+* style=lagrange,lagrange.css
+* style=lagrange,lagrange_gray.css
+* style=default,circumlunar.css
+* etc…
+
+## Text decoration
+
+The text decoration, which interprets the bold for instance, is not part of GemText definition. The text decoration applies everywhere except on the titles and preformated texts.
+
+It's possible to disable the text decoration with a line **^^^** or add to the URL rewriting:
+> &textDecoration=0
+
+
+=> ../css Styles
+
+=> tutogemtext-en.gmi How to make GemText pages?
--- a/docs/installation-fr.gmi
+++ b/docs/installation-fr.gmi
@ -0,0 +1,50 @@
+# Pour installer HtmGem
+
+* Télécharger HtmGem
+=> https://tildegit.org/sbgodin/HtmGem/archive/master.zip
+* Copier les fichiers à la racine du site.
+* Configurer la réécriture d’URL (//URL Rewriting//).
+* Écrire du texte dans /index.gmi.
+* Ouvrir la page du site!
+
+* Accéder à /htmgem pour la documentation.
+
+## Prérequis
+
+* Php v7.3 minimum
+* Module **Php-mbstring** pour gérer l’unicode
+* Un serveur web (Apache et Nginx supportés)
+* Module **mod-rewrite** pour la réécriture d’URL
+
+### Nginx
+```
+index index.gmi index.php index.html
+rewrite ^(.+\.gmi)$ /htmgem/index.php?rw=1&url=$1&style=default,htmgem.css;
+error_page 403 /htmgem;
+location = /favicon.ico { alias /var/www/dev/htmgem/favicon.ico; }
+```
+
+### Apache
+```
+DirectoryIndex index.gmi index.php index.html
+RewriteEngine on
+rewriteRule ^(.+\.gmi)$ htmgem/index.php?rw=1&url=$1&style=default,htmgem.css
+```
+
+Autres styles disponibles :
+* style=lagrange,lagrange.css
+* style=lagrange,lagrange_gray.css
+* style=default,circumlunar.css
+* etc…
+
+## Décoration du texte
+
+La décoration du texte, qui interprête le **gras** par exemple, ne fait pas partie de la définition de GemText. La décoration du texte s’applique partout sauf sur les titres et blocs préformatés.
+
+On peut désactiver la décoration du texte avec une ligne **^^^** ou ajouter à la **réécriture** d’URL :
+> &textDecoration=0
+
+
+=> ../css Styles
+
+=> tutogemtext-fr.gmi Comment faire des pages GemText ?
--- a/docs/specification_Gemini_v0.14.3.gmi
+++ b/docs/specification_Gemini_v0.14.3.gmi
@ -0,0 +1,355 @@
+```
+NOTE: this page was downloaded on 2021-03-04 from the original capsule
+gemini://gemini.circumlunar.space/docs/specification.gmi Gemini
+https://gemini.circumlunar.space/docs/specification.gmi Web
+```
+^^^
+
+# Project Gemini
+
+## Speculative specification
+
+v0.14.3, November 29th 2020
+
+This is an increasingly less rough sketch of an actual spec for Project Gemini.  Although not finalised yet, further changes to the specification are likely to be relatively small.  You can write code to this pseudo-specification and be confident that it probably won't become totally non-functional due to massive changes next week, but you are still urged to keep an eye on ongoing development of the protocol and make changes as required.
+
+This is provided mostly so that people can quickly get up to speed on what I'm thinking without having to read lots and lots of old phlog posts and keep notes.
+
+Feedback on any part of this is extremely welcome, please email solderpunk@posteo.net.
+
+# 1 Overview
+
+Gemini is a client-server protocol featuring request-response transactions, broadly similar to gopher or HTTP.  Connections are closed at the end of a single transaction and cannot be reused.  When Gemini is served over TCP/IP, servers should listen on port 1965 (the first manned Gemini mission, Gemini 3, flew in March '65).  This is an unprivileged port, so it's very easy to run a server as a "nobody" user, even if e.g. the server is written in Go and so can't drop privileges in the traditional fashion.
+
+## 1.1 Gemini transactions
+
+There is one kind of Gemini transaction, roughly equivalent to a gopher request or a HTTP "GET" request.  Transactions happen as follows:
+
+C:   Opens connection
+S:   Accepts connection
+C/S: Complete TLS handshake (see section 4)
+C:   Validates server certificate (see 4.2)
+C:   Sends request (one CRLF terminated line) (see section 2)
+S:   Sends response header (one CRLF terminated line), closes connection
+     under non-success conditions (see 3.1 and 3.2)
+S:   Sends response body (text or binary data) (see 3.3)
+S:   Closes connection
+C:   Handles response (see 3.4)
+
+## 1.2 Gemini URI scheme
+
+Resources hosted via Gemini are identified using URIs with the scheme "gemini".  This scheme is syntactically compatible with the generic URI syntax defined in RFC 3986, but does not support all components of the generic syntax.  In particular, the authority component is allowed and required, but its userinfo subcomponent is NOT allowed.  The host subcomponent is required.  The port subcomponent is optional, with a default value of 1965.  The path, query and fragment components are allowed and have no special meanings beyond those defined by the generic syntax.  Spaces in gemini URIs should be encoded as %20, not +.
+
+# 2 Gemini requests
+
+Gemini requests are a single CRLF-terminated line with the following structure:
+
+<URL><CR><LF>
+
+<URL> is a UTF-8 encoded absolute URL, including a scheme, of maximum length 1024 bytes.
+
+Sending an absolute URL instead of only a path or selector is effectively equivalent to building in a HTTP "Host" header.  It permits virtual hosting of multiple Gemini domains on the same IP address.  It also allows servers to optionally act as proxies.  Including schemes other than "gemini" in requests allows servers to optionally act as protocol-translating gateways to e.g. fetch gopher resources over Gemini.  Proxying is optional and the vast majority of servers are expected to only respond to requests for resources at their own domain(s).
+
+# 3 Gemini responses
+
+Gemini response consist of a single CRLF-terminated header line, optionally followed by a response body.
+
+## 3.1 Response headers
+
+Gemini response headers look like this:
+
+<STATUS><SPACE><META><CR><LF>
+
+<STATUS> is a two-digit numeric status code, as described below in 3.2 and in Appendix 1.
+
+<SPACE> is a single space character, i.e. the byte 0x20.
+
+<META> is a UTF-8 encoded string of maximum length 1024 bytes, whose meaning is <STATUS> dependent.
+
+<STATUS> and <META> are separated by a single space character.
+
+If <STATUS> does not belong to the "SUCCESS" range of codes, then the server MUST close the connection after sending the header and MUST NOT send a response body.
+
+If a server sends a <STATUS> which is not a two-digit number or a <META> which exceeds 1024 bytes in length, the client SHOULD close the connection and disregard the response header, informing the user of an error.
+
+## 3.2 Status codes
+
+Gemini uses two-digit numeric status codes.  Related status codes share the same first digit.  Importantly, the first digit of Gemini status codes do not group codes into vague categories like "client error" and "server error" as per HTTP.  Instead, the first digit alone provides enough information for a client to determine how to handle the response.  By design, it is possible to write a simple but feature complete client which only looks at the first digit.  The second digit provides more fine-grained information, for unambiguous server logging, to allow writing comfier interactive clients which provide a slightly more streamlined user interface, and to allow writing more robust and intelligent automated clients like content aggregators, search engine crawlers, etc.
+
+The first digit of a response code unambiguously places the response into one of six categories, which define the semantics of the <META> line.
+
+### 3.2.1 1x (INPUT)
+
+Status codes beginning with 1 are INPUT status codes, meaning:
+
+The requested resource accepts a line of textual user input.  The <META> line is a prompt which should be displayed to the user.  The same resource should then be requested again with the user's input included as a query component.  Queries are included in requests as per the usual generic URL definition in RFC3986, i.e. separated from the path by a ?.  Reserved characters used in the user's input must be "percent-encoded" as per RFC3986, and space characters should also be percent-encoded.
+
+### 3.2.2 2x (SUCCESS)
+
+Status codes beginning with 2 are SUCCESS status codes, meaning:
+
+The request was handled successfully and a response body will follow the response header.  The <META> line is a MIME media type which applies to the response body.
+
+### 3.2.3 3x (REDIRECT)
+
+Status codes beginning with 3 are REDIRECT status codes, meaning:
+
+The server is redirecting the client to a new location for the requested resource.  There is no response body.  <META> is a new URL for the requested resource.  The URL may be absolute or relative.  The redirect should be considered temporary, i.e. clients should continue to request the resource at the original address and should not performance convenience actions like automatically updating bookmarks.  There is no response body.
+
+### 3.2.4 4x (TEMPORARY FAILURE)
+
+Status codes beginning with 4 are TEMPORARY FAILURE status codes, meaning:
+
+The request has failed.  There is no response body.  The nature of the failure is temporary, i.e. an identical request MAY succeed in the future.  The contents of <META> may provide additional information on the failure, and should be displayed to human users.
+
+### 3.2.5 5x (PERMANENT FAILURE)
+
+Status codes beginning with 5 are PERMANENT FAILURE status codes, meaning:
+
+The request has failed.  There is no response body.  The nature of the failure is permanent, i.e. identical future requests will reliably fail for the same reason.  The contents of <META> may provide additional information on the failure, and should be displayed to human users.  Automatic clients such as aggregators or indexing crawlers should not repeat this request.
+
+### 3.2.6 6x (CLIENT CERTIFICATE REQUIRED)
+
+Status codes beginning with 6 are CLIENT CERTIFICATE REQUIRED status codes, meaning:
+
+The requested resource requires a client certificate to access.  If the request was made without a certificate, it should be repeated with one.  If the request was made with a certificate, the server did not accept it and the request should be repeated with a different certificate.  The contents of <META> (and/or the specific 6x code) may provide additional information on certificate requirements or the reason a certificate was rejected.
+
+### 3.2.7 Notes
+
+Note that for basic interactive clients for human use, errors 4 and 5 may be effectively handled identically, by simply displaying the contents of <META> under a heading of "ERROR".  The temporary/permanent error distinction is primarily relevant to well-behaving automated clients.  Basic clients may also choose not to support client-certificate authentication, in which case only four distinct status handling routines are required (for statuses beginning with 1, 2, 3 or a combined 4-or-5).
+
+The full two-digit system is detailed in Appendix 1.  Note that for each of the six valid first digits, a code with a second digit of zero corresponds is a generic status of that kind with no special semantics.  This means that basic servers without any advanced functionality need only be able to return codes of 10, 20, 30, 40 or 50.
+
+The Gemini status code system has been carefully designed so that the increased power (and correspondingly increased complexity) of the second digits is entirely "opt-in" on the part of both servers and clients.
+
+## 3.3 Response bodies
+
+Response bodies are just raw content, text or binary, ala gopher.  There is no support for compression, chunking or any other kind of content or transfer encoding.  The server closes the connection after the final byte, there is no "end of response" signal like gopher's lonely dot.
+
+Response bodies only accompany responses whose header indicates a SUCCESS status (i.e. a status code whose first digit is 2).  For such responses, <META> is a MIME media type as defined in RFC 2046.
+
+Internet media types are registered with a canonical form.  Content transferred via Gemini MUST be represented in the appropriate canonical form prior to its transmission except for "text" types, as defined in the next paragraph.
+
+When in canonical form, media subtypes of the "text" type use CRLF as the text line break.  Gemini relaxes this requirement and allows the transport of text media with plain LF alone (but NOT a plain CR alone) representing a line break when it is done consistently for an entire response body.  Gemini clients MUST accept CRLF and bare LF as being representative of a line break in text media received via Gemini.
+
+If a MIME type begins with "text/" and no charset is explicitly given, the charset should be assumed to be UTF-8.  Compliant clients MUST support UTF-8-encoded text/* responses.  Clients MAY optionally support other encodings.  Clients receiving a response in a charset they cannot decode SHOULD gracefully inform the user what happened instead of displaying garbage.
+
+If <META> is an empty string, the MIME type MUST default to "text/gemini; charset=utf-8".  The text/gemini media type is defined in section 5.
+
+## 3.4 Response body handling
+
+Response handling by clients should be informed by the provided MIME type information.  Gemini defines one MIME type of its own (text/gemini) whose handling is discussed below in section 5.  In all other cases, clients should do "something sensible" based on the MIME type.  Minimalistic clients might adopt a strategy of printing all other text/* responses to the screen without formatting and saving all non-text responses to the disk.  Clients for unix systems may consult /etc/mailcap to find installed programs for handling non-text types.
+
+# 4 TLS
+
+Use of TLS for Gemini transactions is mandatory.
+
+Use of the Server Name Indication (SNI) extension to TLS is also mandatory, to facilitate name-based virtual hosting.
+
+## 4.1 Version requirements
+
+Servers MUST use TLS version 1.2 or higher and SHOULD use TLS version 1.3 or higher.  TLS 1.2 is reluctantly permitted for now to avoid drastically reducing the range of available implementation libraries.  Hopefully TLS 1.3 or higher can be specced in the near future.  Clients who wish to be "ahead of the curve MAY refuse to connect to servers using TLS version 1.2 or lower.
+
+## 4.2 Server certificate validation
+
+Clients can validate TLS connections however they like (including not at all) but the strongly RECOMMENDED approach is to implement a lightweight "TOFU" certificate-pinning system which treats self-signed certificates as first- class citizens.  This greatly reduces TLS overhead on the network (only one cert needs to be sent, not a whole chain) and lowers the barrier to entry for setting up a Gemini site (no need to pay a CA or setup a Let's Encrypt cron job, just make a cert and go).
+
+TOFU stands for "Trust On First Use" and is public-key security model similar to that used by OpenSSH.  The first time a Gemini client connects to a server, it accepts whatever certificate it is presented.  That certificate's fingerprint and expiry date are saved in a persistent database (like the .known_hosts file for SSH), associated with the server's hostname.  On all subsequent connections to that hostname, the received certificate's fingerprint is computed and compared to the one in the database.  If the certificate is not the one previously received, but the previous certificate's expiry date has not passed, the user is shown a warning, analogous to the one web browser users are shown when receiving a certificate without a signature chain leading to a trusted CA.
+
+This model is by no means perfect, but it is not awful and is vastly superior to just accepting self-signed certificates unconditionally.
+
+## 4.3 Client certificates
+
+Although rarely seen on the web, TLS permits clients to identify themselves to servers using certificates, in exactly the same way that servers traditionally identify themselves to the client.  Gemini includes the ability for servers to request in-band that a client repeats a request with a client certificate.  This is a very flexible, highly secure but also very simple notion of client identity with several applications:
+
+* Short-lived client certificates which are generated on demand and deleted immediately after use can be used as "session identifiers" to maintain server-side state for applications.  In this role, client certificates act as a substitute for HTTP cookies, but unlike cookies they are generated voluntarily by the client, and once the client deletes a certificate and its matching key, the server cannot possibly "resurrect" the same value later (unlike so-called "super cookies").
+* Long-lived client certificates can reliably identify a user to a multi-user application without the need for passwords which may be brute-forced.  Even a stolen database table mapping certificate hashes to user identities is not a security risk, as rainbow tables for certificates are not feasible.
+* Self-hosted, single-user applications can be easily and reliably secured in a manner familiar from OpenSSH: the user generates a self-signed certificate and adds its hash to a server-side list of permitted certificates, analogous to the .authorized_keys file for SSH).
+
+Gemini requests will typically be made without a client certificate.  If a requested resource requires a client certificate and one is not included in a request, the server can respond with a status code of 60, 61 or 62 (see Appendix 1 below for a description of all status codes related to client certificates).  A client certificate which is generated or loaded in response to such a status code has its scope bound to the same hostname as the request URL and to all paths below the path of the request URL path.  E.g. if a request for gemini://example.com/foo returns status 60 and the user chooses to generate a new client certificate in response to this, that same certificate should be used for subsequent requests to gemini://example.com/foo, gemini://example.com/foo/bar/, gemini://example.com/foo/bar/baz, etc., until such time as the user decides to delete the certificate or to temporarily deactivate it.  Interactive clients for human users are strongly recommended to make such actions easy and to generally give users full control over the use of client certificates.
+
+# 5 The text/gemini media type
+
+## 5.1 Overview
+
+In the same sense that HTML is the "native" response format of HTTP and plain text is the native response format of gopher, Gemini defines its own native response format - though of course, thanks to the inclusion of a MIME type in the response header Gemini can be used to serve plain text, rich text, HTML, Markdown, LaTeX, etc.
+
+Response bodies of type "text/gemini" are a kind of lightweight hypertext format, which takes inspiration from gophermaps and from Markdown.  The format permits richer typographic possibilities than the plain text of Gopher, but remains extremely easy to parse.  The format is line-oriented, and a satisfactory rendering can be achieved with a single pass of a document, processing each line independently.  As per gopher, links can only be displayed one per line, encouraging neat, list-like structure.
+
+Similar to how the two-digit Gemini status codes were designed so that simple clients can function correctly while ignoring the second digit, the text/gemini format has been designed so that simple clients can ignore the more advanced features and still remain very usable.
+
+## 5.2 Parameters
+
+As a subtype of the top-level media type "text", "text/gemini" inherits the "charset" parameter defined in RFC 2046.  However, as noted in 3.3, the default value of "charset" is "UTF-8" for "text" content transferred via Gemini.
+
+A single additional parameter specific to the "text/gemini" subtype is defined: the "lang" parameter.  The value of "lang" denotes the natural language or language(s) in which the textual content of a "text/gemini" document is written.  The presence of the "lang" parameter is optional.  When the "lang" parameter is present, its interpretation is defined entirely by the client.  For example, clients which use text-to-speech technology to make Gemini content accessible to visually impaired users may use the value of "lang" to improve pronunciation of content.  Clients which render text to a screen may use the value of "lang" to determine whether text should be displayed left-to-right or right-to-left.  Simple clients for users who only read languages written left-to-right may simply ignore the value of "lang".  When the "lang" parameter is not present, no default value should be assumed and clients which require some notion of a language in order to process the content (such as text-to-speech screen readers) should rely on user-input to determine how to proceed in the absence of a "lang" parameter.
+
+Valid values for the "lang" parameter are comma-separated lists of one or more language tags as defined in RFC4646.  For example:
+
+* "text/gemini; lang=en" Denotes a text/gemini document written in English
+* "text/gemini; lang=fr" Denotes a text/gemini document written in French
+* "text/gemini; lang=en,fr" Denotes a text/gemini document written in a mixture of English and French
+* "text/gemini; lang=de-CH" Denotes a text/gemini document written in Swiss German
+* "text/gemini; lang=sr-Cyrl" Denotes a text/gemini document written in Serbian using the Cyrllic script
+* "text/gemini; lang=zh-Hans-CN" Denotes a text/gemini document written in Chinese using the Simplified script as used in mainland China
+
+## 5.3 Line-orientation
+
+As mentioned, the text/gemini format is line-oriented.  Each line of a text/gemini document has a single "line type".  It is possible to unambiguously determine a line's type purely by inspecting its first three characters.  A line's type determines the manner in which it should be presented to the user.  Any details of presentation or rendering associated with a particular line type are strictly limited in scope to that individual line.
+
+There are 7 different line types in total.  However, a fully functional and specification compliant Gemini client need only recognise and handle 4 of them - these are the "core line types", (see 5.4).  Advanced clients can also handle the additional "advanced line types" (see 5.5).  Simple clients can treat all advanced line types as equivalent to one of the core line types and still offer an adequate user experience.
+
+## 5.4 Core line types
+
+The four core line types are:
+
+### 5.4.1 Text lines
+
+Text lines are the most fundamental line type - any line which does not match the definition of another line type defined below defaults to being a text line.  The majority of lines in a typical text/gemini document will be text lines.
+
+Text lines should be presented to the user, after being wrapped to the appropriate width for the client's viewport (see below).  Text lines may be presented to the user in a visually pleasing manner for general reading, the precise meaning of which is at the client's discretion.  For example, variable width fonts may be used, spacing may be normalised, with spaces between sentences being made wider than spacing between words, and other such typographical niceties may be applied.  Clients may permit users to customise the appearance of text lines by altering the font, font size, text and background colour, etc.  Authors should not expect to exercise any control over the precise rendering of their text lines, only of their actual textual content.  Content such as ASCII art, computer source code, etc. which may appear incorrectly when treated as such should be enclosed between preformatting toggle lines (see 5.4.3).
+
+Blank lines are instances of text lines and have no special meaning.  They should be rendered individually as vertical blank space each time they occur.  In this way  they are analogous to <br/> tags in HTML.  Consecutive blank lines should NOT be collapsed into a fewer blank lines.  Note also that consecutive non-blank text lines do not form any kind of coherent unit or block such as a "paragraph": all text lines are independent entities.
+
+Text lines which are longer than can fit on a client's display device SHOULD be "wrapped" to fit, i.e. long lines should be split (ideally at whitespace or at hyphens) into multiple consecutive lines of a device-appropriate width.  This wrapping is applied to each line of text independently.  Multiple consecutive lines which are shorter than the client's display device MUST NOT be combined into fewer, longer lines.
+
+In order to take full advantage of this method of text formatting, authors of text/gemini content SHOULD avoid hard-wrapping to a specific fixed width, in contrast to the convention in Gopherspace where text is typically wrapped at 80 characters or fewer.  Instead, text which should be displayed as a contiguous block should be written as a single long line.  Most text editors can be configured to "soft-wrap", i.e. to write this kind of file while displaying the long lines wrapped at word boundaries to fit the author's display device.
+
+Authors who insist on hard-wrapping their content MUST be aware that the content will display neatly on clients whose display device is as wide as the hard-wrapped length or wider, but will appear with irregular line widths on narrower clients.
+
+### 5.4.2 Link lines
+
+Lines beginning with the two characters "=>" are link lines, which have the following syntax:
+
+```
+=>[<whitespace>]<URL>[<whitespace><USER-FRIENDLY LINK NAME>]
+```
+
+where:
+
+* <whitespace> is any non-zero number of consecutive spaces or tabs
+* Square brackets indicate that the enclosed content is optional.
+* <URL> is a URL, which may be absolute or relative.
+
+All the following examples are valid link lines:
+
+```
+=> gemini://example.org/
+=> gemini://example.org/ An example link
+=> gemini://example.org/foo	Another example link at the same host
+=> foo/bar/baz.txt	A relative link
+=> 	gopher://example.org:70/1 A gopher link
+```
+
+URLs in link lines must have reserved characters and spaces percent-encoded as per RFC 3986.
+
+Note that link URLs may have schemes other than gemini.  This means that Gemini documents can simply and elegantly link to documents hosted via other protocols, unlike gophermaps which can only link to non-gopher content via a non-standard adaptation of the `h` item-type.
+
+Clients can present links to users in whatever fashion the client author wishes, however clients MUST NOT automatically make any network connections as part of displaying links whose scheme corresponds to a network protocol (e.g. links beginning with gemini://, gopher://, https://, ftp:// , etc.).
+
+### 5.4.3 Preformatting toggle lines
+
+Any line whose first three characters are "```" (i.e. three consecutive back ticks with no leading whitespace) are preformatted toggle lines.  These lines should NOT be included in the rendered output shown to the user.  Instead, these lines toggle the parser between preformatted mode being "on" or "off".  Preformatted mode should be "off" at the beginning of a document.  The current status of preformatted mode is the only internal state a parser is required to maintain.  When preformatted mode is "on", the usual rules for identifying line types are suspended, and all lines should be identified as preformatted text lines (see 5.4.4).
+
+Preformatting toggle lines can be thought of as analogous to <pre> and </pre> tags in HTML.
+
+Any text following the leading "```" of a preformat toggle line which toggles preformatted mode on MAY be interpreted by the client as "alt text" pertaining to the preformatted text lines which follow the toggle line.  Use of alt text is at the client's discretion, and simple clients may ignore it.  Alt text is recommended for ASCII art or similar non-textual content which, for example, cannot be meaningfully understood when rendered through a screen reader or usefully indexed by a search engine.  Alt text may also be used for computer source code to identify the programming language which advanced clients may use for syntax highlighting.
+
+Any text following the leading "```" of a preformat toggle line which toggles preformatted mode off MUST be ignored by clients.
+
+### 5.4.4 Preformatted text lines
+
+Preformatted text lines should be presented to the user in a "neutral", monowidth font without any alteration to whitespace or stylistic enhancements.  Graphical clients should use scrolling mechanisms to present preformatted text lines which are longer than the client viewport, in preference to wrapping.  In displaying preformatted text lines, clients should keep in mind applications like ASCII art and computer source code: in particular, source code in languages with significant whitespace (e.g. Python) should be able to be copied and pasted from the client into a file and interpreted/compiled without any problems arising from the client's manner of displaying them.
+
+## 5.5 Advanced line types
+
+The following advanced line types MAY be recognised by advanced clients.  Simple clients may treat them all as text lines as per 5.4.1 without any loss of essential function.
+
+### 5.5.1 Heading lines
+
+Lines beginning with "#" are heading lines.  Heading lines consist of one, two or three consecutive "#" characters, followed by optional whitespace, followed by heading text.  The number of # characters indicates the "level" of header;  #, ## and ### can be thought of as analogous to <h1>, <h2> and <h3> in HTML.
+
+Heading text should be presented to the user, and clients MAY use special formatting, e.g. a larger or bold font, to indicate its status as a header (simple clients may simply print the line, including its leading #s, without any styling at all).  However, the main motivation for the definition of heading lines is not stylistic but to provide a machine-readable representation of the internal structure of the document.  Advanced clients can use this information to, e.g. display an automatically generated and hierarchically formatted "table of contents" for a long document in a side-pane, allowing users to easily jump to specific sections without excessive scrolling.  CMS-style tools automatically generating menus or Atom/RSS feeds for a directory of text/gemini files can use first
+heading in the file as a human-friendly title.
+
+### 5.5.2 Unordered list items
+
+Lines beginning with "* " are unordered list items.  This line type exists purely for stylistic reasons.  The * may be replaced in advanced clients by a bullet symbol.  Any text after the "* " should be presented to the user as if it were a text line, i.e.  wrapped to fit the viewport and formatted "nicely".  Advanced clients can take the space of the bullet symbol into account when wrapping long list items to ensure that all lines of text corresponding to the item are offset an equal distance from the left of the screen.
+
+### 5.5.3 Quote lines
+
+Lines beginning with ">" are quote lines.  This line type exists so that advanced clients may use distinct styling to convey to readers the important semantic information that certain text is being quoted from an external source.  For example, when wrapping long lines to the the viewport, each resultant line may have a ">" symbol placed at the front.
+
+# Appendix 1. Full two digit status codes
+
+## 10 INPUT
+
+As per definition of single-digit code 1 in 3.2.
+
+## 11 SENSITIVE INPUT
+
+As per status code 10, but for use with sensitive input such as passwords.  Clients should present the prompt as per status code 10, but the user's input should not be echoed to the screen to prevent it being read by "shoulder surfers".
+
+## 20 SUCCESS
+
+As per definition of single-digit code 2 in 3.2.
+
+## 30 REDIRECT - TEMPORARY
+
+As per definition of single-digit code 3 in 3.2.
+
+## 31 REDIRECT - PERMANENT
+
+The requested resource should be consistently requested from the new URL provided in future.  Tools like search engine indexers or content aggregators should update their configurations to avoid requesting the old URL, and end-user clients may automatically update bookmarks, etc.  Note that clients which only pay attention to the initial digit of status codes will treat this as a temporary redirect.  They will still end up at the right place, they just won't be able to make use of the knowledge that this redirect is permanent, so they'll pay a small performance penalty by having to follow the redirect each time.
+
+## 40 TEMPORARY FAILURE
+
+As per definition of single-digit code 4 in 3.2.
+
+## 41 SERVER UNAVAILABLE
+
+The server is unavailable due to overload or maintenance.  (cf HTTP 503)
+
+## 42 CGI ERROR
+
+A CGI process, or similar system for generating dynamic content, died unexpectedly or timed out.
+
+## 43 PROXY ERROR
+
+A proxy request failed because the server was unable to successfully complete a transaction with the remote host.  (cf HTTP 502, 504)
+
+## 44 SLOW DOWN
+
+Rate limiting is in effect.  <META> is an integer number of seconds which the client must wait before another request is made to this server.  (cf HTTP 429)
+
+## 50 PERMANENT FAILURE
+
+As per definition of single-digit code 5 in 3.2.
+
+## 51 NOT FOUND
+
+The requested resource could not be found but may be available in the future.  (cf HTTP 404) (struggling to remember this important status code?  Easy: you can't find things hidden at Area 51!)
+
+## 52 GONE
+
+The resource requested is no longer available and will not be available again.  Search engines and similar tools should remove this resource from their indices.  Content aggregators should stop requesting the resource and convey to their human users that the subscribed resource is gone.  (cf HTTP 410)
+
+## 53 PROXY REQUEST REFUSED
+
+The request was for a resource at a domain not served by the server and the server does not accept proxy requests.
+
+## 59 BAD REQUEST
+
+The server was unable to parse the client's request, presumably due to a malformed request.  (cf HTTP 400)
+
+## 60 CLIENT CERTIFICATE REQUIRED
+
+As per definition of single-digit code 6 in 3.2.
+
+## 61 CERTIFICATE NOT AUTHORISED
+
+The supplied client certificate is not authorised for accessing the particular requested resource.  The problem is not with the certificate itself, which may be authorised for other resources.
+
+## 62 CERTIFICATE NOT VALID
+
+The supplied client certificate was not accepted because it is not valid.  This indicates a problem with the certificate in and of itself, with no consideration of the particular requested resource.  The most likely cause is that the certificate's validity start date is in the future or its expiry date has passed, but this code may also indicate an invalid signature, or a violation of a X509 standard requirements.  The <META> should provide more information about the exact error.
--- a/docs/specification_Gemini_v0.16.1.gmi
+++ b/docs/specification_Gemini_v0.16.1.gmi
@ -0,0 +1,365 @@
+```
+NOTE: this page was downloaded on 2022-08-01 from the original capsule
+gemini://gemini.circumlunar.space/docs/specification.gmi Gemini
+https://gemini.circumlunar.space/docs/specification.gmi Web
+```
+^^^
+
+# Project Gemini
+
+## Speculative specification
+
+v0.16.1, January 30th 2022
+
+This is an increasingly less rough sketch of an actual spec for Project Gemini.  Although not finalised yet, further changes to the specification are likely to be relatively small.  You can write code to this pseudo-specification and be confident that it probably won't become totally non-functional due to massive changes next week, but you are still urged to keep an eye on ongoing development of the protocol and make changes as required.
+
+This is provided mostly so that people can quickly get up to speed on what I'm thinking without having to read lots and lots of old phlog posts and keep notes.
+
+Feedback on any part of this is extremely welcome, please email solderpunk@posteo.net.
+
+# Conventions used in this document
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP14.
+
+# 1 Overview
+
+Gemini is a client-server protocol featuring request-response transactions, broadly similar to gopher or HTTP.  Connections are closed at the end of a single transaction and cannot be reused.  When Gemini is served over TCP/IP, servers should listen on port 1965 (the first manned Gemini mission, Gemini 3, flew in March '65).  This is an unprivileged port, so it's very easy to run a server as a "nobody" user, even if e.g. the server is written in Go and so can't drop privileges in the traditional fashion.
+
+## 1.1 Gemini transactions
+
+There is one kind of Gemini transaction, roughly equivalent to a gopher request or a HTTP "GET" request.  Transactions happen as follows:
+
+C:   Opens connection
+S:   Accepts connection
+C/S: Complete TLS handshake (see section 4)
+C:   Validates server certificate (see 4.2)
+C:   Sends request (one CRLF terminated line) (see section 2)
+S:   Sends response header (one CRLF terminated line), closes connection under non-success conditions (see 3.1 and 3.2)
+S:   Sends response body (text or binary data) (see 3.3)
+S:   Closes connection (including TLS close_notify, see section 4)
+C:   Handles response (see 3.4)
+
+Note that clients are not obligated to wait until the server closes the connection to begin handling the response.  This is shown above only for simplicity/clarity, to emphasise that responsibility for closing the connection under typical conditions lies with the server and that the connection should be closed immediately after the completion of the response body.
+
+## 1.2 Gemini URI scheme
+
+Resources hosted via Gemini are identified using URIs with the scheme "gemini".  This scheme is syntactically compatible with the generic URI syntax defined in RFC 3986, but does not support all components of the generic syntax.  In particular, the authority component is allowed and required, but its userinfo subcomponent is NOT allowed.  The host subcomponent is required.  The port subcomponent is optional, with a default value of 1965.  The path, query and fragment components are allowed and have no special meanings beyond those defined by the generic syntax.  An empty path is equivalent to a path consisting only of "/".  Spaces in paths should be encoded as %20, not as +.
+
+Clients SHOULD normalise URIs (as per section 6.2.3 of RFC 3986) before sending requests (see section 2) and servers SHOULD normalise received URIs before processing a request.
+
+# 2 Gemini requests
+
+Gemini requests are a single CRLF-terminated line with the following structure:
+
+<URL><CR><LF>
+
+<URL> is a UTF-8 encoded absolute URL, including a scheme, of maximum length 1024 bytes.  The request MUST NOT begin with a U+FEFF byte order mark.
+
+Sending an absolute URL instead of only a path or selector is effectively equivalent to building in a HTTP "Host" header.  It permits virtual hosting of multiple Gemini domains on the same IP address.  It also allows servers to optionally act as proxies.  Including schemes other than "gemini" in requests allows servers to optionally act as protocol-translating gateways to e.g. fetch gopher resources over Gemini.  Proxying is optional and the vast majority of servers are expected to only respond to requests for resources at their own domain(s).
+
+Clients MUST NOT send anything after the first occurrence of <CR><LF> in a request, and servers MUST ignore anything sent after the first occurrence of a <CR><LF>.
+
+# 3 Gemini responses
+
+Gemini response consist of a single CRLF-terminated header line, optionally followed by a response body.
+
+## 3.1 Response headers
+
+Gemini response headers look like this:
+
+<STATUS><SPACE><META><CR><LF>
+
+<STATUS> is a two-digit numeric status code, as described below in 3.2 and in Appendix 1.
+
+<SPACE> is a single space character, i.e. the byte 0x20.
+
+<META> is a UTF-8 encoded string of maximum length 1024 bytes, whose meaning is <STATUS> dependent.
+
+The response header as a whole and <META> as a sub-string both MUST NOT begin with a U+FEFF byte order mark.
+
+If <STATUS> does not belong to the "SUCCESS" range of codes, then the server MUST close the connection after sending the header and MUST NOT send a response body.
+
+If a server sends a <STATUS> which is not a two-digit number or a <META> which exceeds 1024 bytes in length, the client SHOULD close the connection and disregard the response header, informing the user of an error.
+
+## 3.2 Status codes
+
+Gemini uses two-digit numeric status codes.  Related status codes share the same first digit.  Importantly, the first digit of Gemini status codes do not group codes into vague categories like "client error" and "server error" as per HTTP.  Instead, the first digit alone provides enough information for a client to determine how to handle the response.  By design, it is possible to write a simple but feature complete client which only looks at the first digit.  The second digit provides more fine-grained information, for unambiguous server logging, to allow writing comfier interactive clients which provide a slightly more streamlined user interface, and to allow writing more robust and intelligent automated clients like content aggregators, search engine crawlers, etc.
+
+The first digit of a response code unambiguously places the response into one of six categories, which define the semantics of the <META> line.
+
+### 3.2.1 1x (INPUT)
+
+Status codes beginning with 1 are INPUT status codes, meaning:
+
+The requested resource accepts a line of textual user input.  The <META> line is a prompt which should be displayed to the user.  The same resource should then be requested again with the user's input included as a query component.  Queries are included in requests as per the usual generic URL definition in RFC3986, i.e. separated from the path by a ?.  Reserved characters used in the user's input must be "percent-encoded" as per RFC3986, and space characters should also be percent-encoded.
+
+### 3.2.2 2x (SUCCESS)
+
+Status codes beginning with 2 are SUCCESS status codes, meaning:
+
+The request was handled successfully and a response body will follow the response header.  The <META> line is a MIME media type which applies to the response body.
+
+### 3.2.3 3x (REDIRECT)
+
+Status codes beginning with 3 are REDIRECT status codes, meaning:
+
+The server is redirecting the client to a new location for the requested resource.  There is no response body.  <META> is a new URL for the requested resource.  The URL may be absolute or relative.  If relative, it should be resolved against the URL used in the original request.  If the URL used in the original request contained a query string, the client MUST NOT apply this string to the redirect URL, instead using the redirect URL "as is".  The redirect should be considered temporary, i.e. clients should continue to request the resource at the original address and should not perform convenience actions like automatically updating bookmarks.  There is no response body.
+
+### 3.2.4 4x (TEMPORARY FAILURE)
+
+Status codes beginning with 4 are TEMPORARY FAILURE status codes, meaning:
+
+The request has failed.  There is no response body.  The nature of the failure is temporary, i.e. an identical request MAY succeed in the future.  The contents of <META> may provide additional information on the failure, and should be displayed to human users.
+
+### 3.2.5 5x (PERMANENT FAILURE)
+
+Status codes beginning with 5 are PERMANENT FAILURE status codes, meaning:
+
+The request has failed.  There is no response body.  The nature of the failure is permanent, i.e. identical future requests will reliably fail for the same reason.  The contents of <META> may provide additional information on the failure, and should be displayed to human users.  Automatic clients such as aggregators or indexing crawlers should not repeat this request.
+
+### 3.2.6 6x (CLIENT CERTIFICATE REQUIRED)
+
+Status codes beginning with 6 are CLIENT CERTIFICATE REQUIRED status codes, meaning:
+
+The requested resource requires a client certificate to access.  If the request was made without a certificate, it should be repeated with one.  If the request was made with a certificate, the server did not accept it and the request should be repeated with a different certificate.  The contents of <META> (and/or the specific 6x code) may provide additional information on certificate requirements or the reason a certificate was rejected.
+
+### 3.2.7 Notes
+
+Note that for basic interactive clients for human use, errors 4 and 5 may be effectively handled identically, by simply displaying the contents of <META> under a heading of "ERROR".  The temporary/permanent error distinction is primarily relevant to well-behaving automated clients.  Basic clients may also choose not to support client-certificate authentication, in which case only four distinct status handling routines are required (for statuses beginning with 1, 2, 3 or a combined 4-or-5).
+
+The full two-digit system is detailed in Appendix 1.  Note that for each of the six valid first digits, a code with a second digit of zero corresponds is a generic status of that kind with no special semantics.  This means that basic servers without any advanced functionality need only be able to return codes of 10, 20, 30, 40 or 50.
+
+The Gemini status code system has been carefully designed so that the increased power (and correspondingly increased complexity) of the second digits is entirely "opt-in" on the part of both servers and clients.
+
+## 3.3 Response bodies
+
+Response bodies are just raw content, text or binary, à la gopher.  There is no support for compression, chunking or any other kind of content or transfer encoding.  The server closes the connection after the final byte, there is no "end of response" signal like gopher's lonely dot.
+
+Response bodies only accompany responses whose header indicates a SUCCESS status (i.e. a status code whose first digit is 2).  For such responses, <META> is a MIME media type as defined in RFC 2046.
+
+Internet media types are registered with a canonical form.  Content transferred via Gemini MUST be represented in the appropriate canonical form prior to its transmission except for "text" types, as defined in the next paragraph.
+
+When in canonical form, media subtypes of the "text" type use CRLF as the text line break.  Gemini relaxes this requirement and allows the transport of text media with plain LF alone (but NOT a plain CR alone) representing a line break when it is done consistently for an entire response body.  Gemini clients MUST accept CRLF and bare LF as being representative of a line break in text media received via Gemini.
+
+If a MIME type begins with "text/" and no charset is explicitly given, the charset should be assumed to be UTF-8.  Compliant clients MUST support UTF-8-encoded text/* responses.  Clients MAY optionally support other encodings.  Clients receiving a response in a charset they cannot decode SHOULD gracefully inform the user what happened instead of displaying garbage.
+
+If <META> is an empty string, the MIME type MUST default to "text/gemini; charset=utf-8".  The text/gemini media type is defined in section 5.
+
+## 3.4 Response body handling
+
+Response handling by clients should be informed by the provided MIME type information.  Gemini defines one MIME type of its own (text/gemini) whose handling is discussed below in section 5.  In all other cases, clients should do "something sensible" based on the MIME type.  Minimalistic clients might adopt a strategy of printing all other text/* responses to the screen without formatting and saving all non-text responses to the disk.  Clients for unix systems may consult /etc/mailcap to find installed programs for handling non-text types.
+
+# 4 TLS
+
+Use of TLS for Gemini transactions is mandatory.
+
+Use of the Server Name Indication (SNI) extension to TLS is also mandatory, to facilitate name-based virtual hosting.
+
+As per RFCs 5246 and 8446, Gemini servers MUST send a TLS `close_notify` prior to closing the connection after sending a complete response.  This is essential to disambiguate completed responses from responses closed prematurely due to network error or attack.
+
+## 4.1 Version requirements
+
+Servers MUST use TLS version 1.2 or higher and SHOULD use TLS version 1.3 or higher.  TLS 1.2 is reluctantly permitted for now to avoid drastically reducing the range of available implementation libraries.  Hopefully TLS 1.3 or higher can be specced in the near future.  Clients who wish to be "ahead of the curve MAY refuse to connect to servers using TLS version 1.2 or lower.
+
+## 4.2 Server certificate validation
+
+Clients can validate TLS connections however they like (including not at all) but the strongly RECOMMENDED approach is to implement a lightweight "TOFU" certificate-pinning system which treats self-signed certificates as first- class citizens.  This greatly reduces TLS overhead on the network (only one cert needs to be sent, not a whole chain) and lowers the barrier to entry for setting up a Gemini site (no need to pay a CA or setup a Let's Encrypt cron job, just make a cert and go).
+
+TOFU stands for "Trust On First Use" and is public-key security model similar to that used by OpenSSH.  The first time a Gemini client connects to a server, it accepts whatever certificate it is presented.  That certificate's fingerprint and expiry date are saved in a persistent database (like the .known_hosts file for SSH), associated with the server's hostname.  On all subsequent connections to that hostname, the received certificate's fingerprint is computed and compared to the one in the database.  If the certificate is not the one previously received, but the previous certificate's expiry date has not passed, the user is shown a warning, analogous to the one web browser users are shown when receiving a certificate without a signature chain leading to a trusted CA.
+
+This model is by no means perfect, but it is not awful and is vastly superior to just accepting self-signed certificates unconditionally.
+
+## 4.3 Client certificates
+
+Although rarely seen on the web, TLS permits clients to identify themselves to servers using certificates, in exactly the same way that servers traditionally identify themselves to the client.  Gemini includes the ability for servers to request in-band that a client repeats a request with a client certificate.  This is a very flexible, highly secure but also very simple notion of client identity with several applications:
+
+* Short-lived client certificates which are generated on demand and deleted immediately after use can be used as "session identifiers" to maintain server-side state for applications.  In this role, client certificates act as a substitute for HTTP cookies, but unlike cookies they are generated voluntarily by the client, and once the client deletes a certificate and its matching key, the server cannot possibly "resurrect" the same value later (unlike so-called "super cookies").
+* Long-lived client certificates can reliably identify a user to a multi-user application without the need for passwords which may be brute-forced.  Even a stolen database table mapping certificate hashes to user identities is not a security risk, as rainbow tables for certificates are not feasible.
+* Self-hosted, single-user applications can be easily and reliably secured in a manner familiar from OpenSSH: the user generates a self-signed certificate and adds its hash to a server-side list of permitted certificates, analogous to the .authorized_keys file for SSH).
+
+Gemini requests will typically be made without a client certificate.  If a requested resource requires a client certificate and one is not included in a request, the server can respond with a status code of 60, 61 or 62 (see Appendix 1 below for a description of all status codes related to client certificates).  A client certificate which is generated or loaded in response to such a status code has its scope bound to the same hostname as the request URL and to all paths below the path of the request URL path.  E.g. if a request for gemini://example.com/foo returns status 60 and the user chooses to generate a new client certificate in response to this, that same certificate should be used for subsequent requests to gemini://example.com/foo, gemini://example.com/foo/bar/, gemini://example.com/foo/bar/baz, etc., until such time as the user decides to delete the certificate or to temporarily deactivate it.  Interactive clients for human users are strongly recommended to make such actions easy and to generally give users full control over the use of client certificates.
+
+# 5 The text/gemini media type
+
+## 5.1 Overview
+
+In the same sense that HTML is the "native" response format of HTTP and plain text is the native response format of gopher, Gemini defines its own native response format - though of course, thanks to the inclusion of a MIME type in the response header Gemini can be used to serve plain text, rich text, HTML, Markdown, LaTeX, etc.
+
+Response bodies of type "text/gemini" are a kind of lightweight hypertext format, which takes inspiration from gophermaps and from Markdown.  The format permits richer typographic possibilities than the plain text of Gopher, but remains extremely easy to parse.  The format is line-oriented, and a satisfactory rendering can be achieved with a single pass of a document, processing each line independently.  As per gopher, links can only be displayed one per line, encouraging neat, list-like structure.
+
+Similar to how the two-digit Gemini status codes were designed so that simple clients can function correctly while ignoring the second digit, the text/gemini format has been designed so that simple clients can ignore the more advanced features and still remain very usable.
+
+## 5.2 Parameters
+
+As a subtype of the top-level media type "text", "text/gemini" inherits the "charset" parameter defined in RFC 2046.  However, as noted in 3.3, the default value of "charset" is "UTF-8" for "text" content transferred via Gemini.
+
+A single additional parameter specific to the "text/gemini" subtype is defined: the "lang" parameter.  The value of "lang" denotes the natural language or language(s) in which the textual content of a "text/gemini" document is written.  The presence of the "lang" parameter is optional.  When the "lang" parameter is present, its interpretation is defined entirely by the client.  For example, clients which use text-to-speech technology to make Gemini content accessible to visually impaired users may use the value of "lang" to improve pronunciation of content.  Clients which render text to a screen may use the value of "lang" to determine whether text should be displayed left-to-right or right-to-left.  Simple clients for users who only read languages written left-to-right may simply ignore the value of "lang".  When the "lang" parameter is not present, no default value should be assumed and clients which require some notion of a language in order to process the content (such as text-to-speech screen readers) should rely on user-input to determine how to proceed in the absence of a "lang" parameter.
+
+Valid values for the "lang" parameter are comma-separated lists of one or more language tags as defined in BCP47.  For example:
+
+* "text/gemini; lang=en" Denotes a text/gemini document written in English
+* "text/gemini; lang=fr" Denotes a text/gemini document written in French
+* "text/gemini; lang=en,fr" Denotes a text/gemini document written in a mixture of English and French
+* "text/gemini; lang=de-CH" Denotes a text/gemini document written in Swiss German
+* "text/gemini; lang=sr-Cyrl" Denotes a text/gemini document written in Serbian using the Cyrllic script
+* "text/gemini; lang=zh-Hans-CN" Denotes a text/gemini document written in Chinese using the Simplified script as used in mainland China
+
+## 5.3 Line-orientation
+
+As mentioned, the text/gemini format is line-oriented.  Each line of a text/gemini document has a single "line type".  It is possible to unambiguously determine a line's type purely by inspecting its first three characters.  A line's type determines the manner in which it should be presented to the user.  Any details of presentation or rendering associated with a particular line type are strictly limited in scope to that individual line.
+
+There are 7 different line types in total.  However, a fully functional and specification compliant Gemini client need only recognise and handle 4 of them - these are the "core line types", (see 5.4).  Advanced clients can also handle the additional "advanced line types" (see 5.5).  Simple clients can treat all advanced line types as equivalent to one of the core line types and still offer an adequate user experience.
+
+## 5.4 Core line types
+
+The four core line types are:
+
+### 5.4.1 Text lines
+
+Text lines are the most fundamental line type - any line which does not match the definition of another line type defined below defaults to being a text line.  The majority of lines in a typical text/gemini document will be text lines.
+
+Text lines should be presented to the user, after being wrapped to the appropriate width for the client's viewport (see below).  Text lines may be presented to the user in a visually pleasing manner for general reading, the precise meaning of which is at the client's discretion.  For example, variable width fonts may be used, spacing may be normalised, with spaces between sentences being made wider than spacing between words, and other such typographical niceties may be applied.  Clients may permit users to customise the appearance of text lines by altering the font, font size, text and background colour, etc.  Authors should not expect to exercise any control over the precise rendering of their text lines, only of their actual textual content.  Content such as ASCII art, computer source code, etc. which may appear incorrectly when treated as such should be enclosed between preformatting toggle lines (see 5.4.3).
+
+Blank lines are instances of text lines and have no special meaning.  They should be rendered individually as vertical blank space each time they occur.  In this way  they are analogous to <br/> tags in HTML.  Consecutive blank lines should NOT be collapsed into fewer blank lines.  Note also that consecutive non-blank text lines do not form any kind of coherent unit or block such as a "paragraph": all text lines are independent entities.
+
+Text lines which are longer than can fit on a client's display device SHOULD be "wrapped" to fit, i.e. long lines should be split (ideally at whitespace or at hyphens) into multiple consecutive lines of a device-appropriate width.  This wrapping is applied to each line of text independently.  Multiple consecutive lines which are shorter than the client's display device MUST NOT be combined into fewer, longer lines.
+
+In order to take full advantage of this method of text formatting, authors of text/gemini content SHOULD avoid hard-wrapping to a specific fixed width, in contrast to the convention in Gopherspace where text is typically wrapped at 80 characters or fewer.  Instead, text which should be displayed as a contiguous block should be written as a single long line.  Most text editors can be configured to "soft-wrap", i.e. to write this kind of file while displaying the long lines wrapped at word boundaries to fit the author's display device.
+
+Authors who insist on hard-wrapping their content MUST be aware that the content will display neatly on clients whose display device is as wide as the hard-wrapped length or wider, but will appear with irregular line widths on narrower clients.
+
+### 5.4.2 Link lines
+
+Lines beginning with the two characters "=>" are link lines, which have the following syntax:
+
+```
+=>[<whitespace>]<URL>[<whitespace><USER-FRIENDLY LINK NAME>]
+```
+
+where:
+
+* <whitespace> is any non-zero number of consecutive spaces or tabs
+* Square brackets indicate that the enclosed content is optional.
+* <URL> is a URL, which may be absolute or relative.
+
+All the following examples are valid link lines:
+
+```
+=> gemini://example.org/
+=> gemini://example.org/ An example link
+=> gemini://example.org/foo	Another example link at the same host
+=> foo/bar/baz.txt	A relative link
+=> 	gopher://example.org:70/1 A gopher link
+```
+
+URLs in link lines must have reserved characters and spaces percent-encoded as per RFC 3986.
+
+Note that link URLs may have schemes other than gemini.  This means that Gemini documents can simply and elegantly link to documents hosted via other protocols, unlike gophermaps which can only link to non-gopher content via a non-standard adaptation of the `h` item-type.
+
+Clients can present links to users in whatever fashion the client author wishes, however clients MUST NOT automatically make any network connections as part of displaying links whose scheme corresponds to a network protocol (e.g. links beginning with gemini://, gopher://, https://, ftp:// , etc.).
+
+### 5.4.3 Preformatting toggle lines
+
+Any line whose first three characters are "```" (i.e. three consecutive back ticks with no leading whitespace) are preformatted toggle lines.  These lines should NOT be included in the rendered output shown to the user.  Instead, these lines toggle the parser between preformatted mode being "on" or "off".  Preformatted mode should be "off" at the beginning of a document.  The current status of preformatted mode is the only internal state a parser is required to maintain.  When preformatted mode is "on", the usual rules for identifying line types are suspended, and all lines should be identified as preformatted text lines (see 5.4.4).
+
+Preformatting toggle lines can be thought of as analogous to <pre> and </pre> tags in HTML.
+
+Any text following the leading "```" of a preformat toggle line which toggles preformatted mode on MAY be interpreted by the client as "alt text" pertaining to the preformatted text lines which follow the toggle line.  Use of alt text is at the client's discretion, and simple clients may ignore it.  Alt text is recommended for ASCII art or similar non-textual content which, for example, cannot be meaningfully understood when rendered through a screen reader or usefully indexed by a search engine.  Alt text may also be used for computer source code to identify the programming language which advanced clients may use for syntax highlighting.
+
+Any text following the leading "```" of a preformat toggle line which toggles preformatted mode off MUST be ignored by clients.
+
+### 5.4.4 Preformatted text lines
+
+Preformatted text lines should be presented to the user in a "neutral", monowidth font without any alteration to whitespace or stylistic enhancements.  Graphical clients should use scrolling mechanisms to present preformatted text lines which are longer than the client viewport, in preference to wrapping.  In displaying preformatted text lines, clients should keep in mind applications like ASCII art and computer source code: in particular, source code in languages with significant whitespace (e.g. Python) should be able to be copied and pasted from the client into a file and interpreted/compiled without any problems arising from the client's manner of displaying them.
+
+## 5.5 Advanced line types
+
+The following advanced line types MAY be recognised by advanced clients.  Simple clients may treat them all as text lines as per 5.4.1 without any loss of essential function.
+
+### 5.5.1 Heading lines
+
+Lines beginning with "#" are heading lines.  Heading lines consist of one, two or three consecutive "#" characters, followed by optional whitespace, followed by heading text.  The number of # characters indicates the "level" of header;  #, ## and ### can be thought of as analogous to <h1>, <h2> and <h3> in HTML.
+
+Heading text should be presented to the user, and clients MAY use special formatting, e.g. a larger or bold font, to indicate its status as a header (simple clients may simply print the line, including its leading #s, without any styling at all).  However, the main motivation for the definition of heading lines is not stylistic but to provide a machine-readable representation of the internal structure of the document.  Advanced clients can use this information to, e.g. display an automatically generated and hierarchically formatted "table of contents" for a long document in a side-pane, allowing users to easily jump to specific sections without excessive scrolling.  CMS-style tools automatically generating menus or Atom/RSS feeds for a directory of text/gemini files can use the first heading in the file as a human-friendly title.
+
+### 5.5.2 Unordered list items
+
+Lines beginning with "* " are unordered list items.  This line type exists purely for stylistic reasons.  The * may be replaced in advanced clients by a bullet symbol.  Any text after the "* " should be presented to the user as if it were a text line, i.e.  wrapped to fit the viewport and formatted "nicely".  Advanced clients can take the space of the bullet symbol into account when wrapping long list items to ensure that all lines of text corresponding to the item are offset an equal distance from the left of the screen.
+
+### 5.5.3 Quote lines
+
+Lines beginning with ">" are quote lines.  This line type exists so that advanced clients may use distinct styling to convey to readers the important semantic information that certain text is being quoted from an external source.  For example, when wrapping long lines to the viewport, each resultant line may have a ">" symbol placed at the front.
+
+# Appendix 1. Full two digit status codes
+
+## 10 INPUT
+
+As per definition of single-digit code 1 in 3.2.
+
+## 11 SENSITIVE INPUT
+
+As per status code 10, but for use with sensitive input such as passwords.  Clients should present the prompt as per status code 10, but the user's input should not be echoed to the screen to prevent it being read by "shoulder surfers".
+
+## 20 SUCCESS
+
+As per definition of single-digit code 2 in 3.2.
+
+## 30 REDIRECT - TEMPORARY
+
+As per definition of single-digit code 3 in 3.2.
+
+## 31 REDIRECT - PERMANENT
+
+The requested resource should be consistently requested from the new URL provided in future.  Tools like search engine indexers or content aggregators should update their configurations to avoid requesting the old URL, and end-user clients may automatically update bookmarks, etc.  Note that clients which only pay attention to the initial digit of status codes will treat this as a temporary redirect.  They will still end up at the right place, they just won't be able to make use of the knowledge that this redirect is permanent, so they'll pay a small performance penalty by having to follow the redirect each time.
+
+## 40 TEMPORARY FAILURE
+
+As per definition of single-digit code 4 in 3.2.
+
+## 41 SERVER UNAVAILABLE
+
+The server is unavailable due to overload or maintenance.  (cf HTTP 503)
+
+## 42 CGI ERROR
+
+A CGI process, or similar system for generating dynamic content, died unexpectedly or timed out.
+
+## 43 PROXY ERROR
+
+A proxy request failed because the server was unable to successfully complete a transaction with the remote host.  (cf HTTP 502, 504)
+
+## 44 SLOW DOWN
+
+Rate limiting is in effect.  <META> is an integer number of seconds which the client must wait before another request is made to this server.  (cf HTTP 429)
+
+## 50 PERMANENT FAILURE
+
+As per definition of single-digit code 5 in 3.2.
+
+## 51 NOT FOUND
+
+The requested resource could not be found but may be available in the future.  (cf HTTP 404) (struggling to remember this important status code?  Easy: you can't find things hidden at Area 51!)
+
+## 52 GONE
+
+The resource requested is no longer available and will not be available again.  Search engines and similar tools should remove this resource from their indices.  Content aggregators should stop requesting the resource and convey to their human users that the subscribed resource is gone.  (cf HTTP 410)
+
+## 53 PROXY REQUEST REFUSED
+
+The request was for a resource at a domain not served by the server and the server does not accept proxy requests.
+
+## 59 BAD REQUEST
+
+The server was unable to parse the client's request, presumably due to a malformed request.  (cf HTTP 400)
+
+## 60 CLIENT CERTIFICATE REQUIRED
+
+As per definition of single-digit code 6 in 3.2.
+
+## 61 CERTIFICATE NOT AUTHORISED
+
+The supplied client certificate is not authorised for accessing the particular requested resource.  The problem is not with the certificate itself, which may be authorised for other resources.
+
+## 62 CERTIFICATE NOT VALID
+
+The supplied client certificate was not accepted because it is not valid.  This indicates a problem with the certificate in and of itself, with no consideration of the particular requested resource.  The most likely cause is that the certificate's validity start date is in the future or its expiry date has passed, but this code may also indicate an invalid signature, or a violation of X509 standard requirements.  The <META> should provide more information about the exact error.
--- a/docs/tutogemtext-en.gmi
+++ b/docs/tutogemtext-en.gmi
@ -0,0 +1,74 @@
+# GemText tutorial
+
+Gemini is a protocol, a syntax, servers and clients. Its syntax is the GemText. Its principle is that it's the Gemini browser of the user which decides about the display. The font, the size, the page background, everything. One of the goal is to focus on the text and some other details explained below.
+
+The paragraph you just read is on one physical line, cut by the program you use in several logical lines to fit your screen. Between this text line and the preceding one, there's an empty line to mark the paragraph change.
+Here, I just did a line wrap. The margin should be lower, but it depends upon the program you are using.
+
+The titles of level one, two and three are first written as they are physically on this page, then on the line after they are displayed normally by the Gemini browser:
+
+# Title level 1
+# Title level 1
+
+## Title level 2
+## Title level 2
+
+### Title level 3
+### Title level 3
+
+# Citations
+
+> Text citation. The line can be as long as needed, it's your program that must cut the line to display it. One physical line can be long, but there's no need to wrap the line.
+> Text citation. The line can be as long as needed, it's your program that must cut the line to display it. One physical line can be long, but there's no need to wrap the line.
+
+# Preformated blocks
+
+```
+The preformated blocks are lines enclosed by ``` on one line before the block and a ``` after the block.
+```
+```
+The preformated blocks are lines enclosed by ``` on one line before the block and a ``` after the block.
+```
+
+# Unordered lists
+
+* The unordered lists are lines beginning by * one after the others.
+* They can be used to enumerate.
+* However, GemText doesn't recognize the ordered lists.
+* The unordered lists are lines beginning by * one after the others.
+* They can be used to enumerate.
+* However, GemText doesn't recognize the ordered lists.
+
+# The links
+
+There can be only one link on a line. And the line is dedicated to! Here's for instance a link to the site centralizing at the moment the information about Gemini:
+
+=> gemini://gemini.circumlunar.space/docs/specification.gmi Gemini specifications
+=> gemini://gemini.circumlunar.space/docs/specification.gmi Gemini specifications
+
+=> https://gemini.circumlunar.space/docs/specification.gmi
+=> https://gemini.circumlunar.space/docs/specification.gmi
+
+# Text decoration
+
+The text decoration is not part of Gemini's specifications.
+
+This **line** uses the //text decoration// which can ~~strike through~~ or __underline__ words.
+This **line** uses the //text decoration// which can ~~strike through~~ or __underline__ words.
+
+It's possible to disable and enable the text decoration with **^^^** on a line.
+
+And voilà! You know everything 🥳
+
+———————————————————— ————————————————————
+
+## HtmGem
+
+HtmGem allows to host Gemini pages and publish them on a web server with **Php**. When opening a page ***.gmi**, it translates it for the web browser. This page is displayed this way. It allows to use the Gemini syntax (GemText) on the web.
+=> https://gmi.sbgodin.fr/htmgem
+
+
+### License of this page
+This page is under the free licence **CC BY-SA 2.0**.
+=> https://creativecommons.org/licenses/by-sa/2.0/en/ Text under license CC BY-SA 2.0
+=> https://gmi.sbgodin.fr/ https://gmi.sbgodin.fr/ ⸻ Christophe HENRY
--- a/docs/tutogemtext-fr.gmi
+++ b/docs/tutogemtext-fr.gmi
@ -0,0 +1,74 @@
+# Tutoriel GemText
+
+Gemini est un protocole, une syntaxe, des serveurs et des clients. Sa syntaxe est le GemText. Son principe est que c’est le navigateur Gemeni de l’utilisateur qui décide de l‘affichage. La police de caractère, la taille, le fond d‘écran, tout. L’un des buts recherchés est de se concentrer sur le texte et quelques autres détails abordés ci-après.
+
+Le paragraphe que vous venez de lire existe sur une seule ligne physique, que le programme que vous utilisez a dû découper en plusieurs lignes logiques afin que cela tienne sur votre écran. Entre cette présente ligne de texte et celle d’avant, il y a une ligne vide pour marquer le changement de paragraphe.
+Ici, j’ai simplement passé à la ligne. L’espacement devrait être moindre mais cela dépend du programme que vous utilisez.
+
+Les titres de niveau un, deux puis trois sont d’abord écrits ci-après tel qu’ils sont écrits physiquement dans cette page, puis sur la ligne d’après ils sont affichés normalement par le navigateur Gemini :
+
+# Titre de niveau 1
+# Titre de niveau 1
+
+## Titre de niveau 2
+## Titre de niveau 2
+
+### Titre de niveau 3
+### Titre de niveau 3
+
+# Citations
+
+> Citation de texte. La ligne peut être aussi longue que voulue, c’est votre programme qui doit découper la ligne pour l’afficher. Une seule ligne physique peut être longue, mais il n’y a pas besoin de placer de retour à la ligne.
+> Citation de texte. La ligne peut être aussi longue que voulue, c’est votre programme qui doit découper la ligne pour l’afficher. Une seule ligne physique peut être longue, mais il n’y a pas besoin de placer de retour à la ligne.
+
+# Blocs préformatés
+
+```
+Les blocs préformatés sont des lignes encadrées par un ``` sur une ligne avant le bloc et un ``` après le bloc.
+```
+```
+Les blocs préformatés sont des lignes encadrées par un ``` sur une ligne avant le bloc et un ``` après le bloc.
+```
+
+# Listes non-ordonnées
+
+* Les listes non-ordonnées sont des lignes commençant par * les unes après les autres.
+* Elles servent à énumérer.
+* Par contre, GemText ne reconnaît pas les listes numérotées.
+* Les listes non-ordonnées sont des lignes commençant par * les unes après les autres.
+* Elles servent à énumérer.
+* Par contre, GemText ne reconnaît pas les listes numérotées.
+
+# Les liens
+
+Il ne peut exister qu’un lien par ligne. Et la ligne est dédiée à ça ! Voici par exemple un lien vers le site centralisant pour le moment les informations sur Gemini :
+
+=> gemini://gemini.circumlunar.space/docs/specification.gmi Spécifications de Gemini
+=> gemini://gemini.circumlunar.space/docs/specification.gmi Spécifications de Gemini
+
+=> https://gemini.circumlunar.space/docs/specification.gmi
+=> https://gemini.circumlunar.space/docs/specification.gmi
+
+# Décoration du texte
+
+La décoration du texte ne fait pas partie des spécifications de Gemini.
+
+Cette **ligne** utilise la //décoration du texte// qui peut ~~barrer~~ ou __souligner__ des mots.
+Cette **ligne** utilise la //décoration du texte// qui peut ~~barrer~~ ou __souligner__ des mots.
+
+On peut désactiver et activer la décoration du texte avec **^^^** sur une ligne.
+
+Et voilà ! Vous savez tout 🥳
+
+———————————————————— ————————————————————
+
+## HtmGem
+
+HtmGem permet d’héberger des pages Gemini et de les publier sur un serveur web muni de **Php**. À l’ouverture d’une page ***.gmi**, il la traduit pour le navigateur web. Cette présente page est affichée de cette façon. Il permet d’utiliser la syntaxe Gemini (GemText) via le web.
+=> https://gmi.sbgodin.fr/htmgem
+
+
+### Licence de cette page
+Cette page est sous licence libre **CC BY-SA 2.0**.
+=> https://creativecommons.org/licenses/by-sa/2.0/fr/ Texte sous licence CC BY-SA 2.0
+=> https://gmi.sbgodin.fr/ https://gmi.sbgodin.fr/ ⸻ Christophe HENRY
--- a/favicon.ico
+++ b/favicon.ico
--- a/htmgem.css
+++ b/htmgem.css
@ -1,88 +0,0 @@
-html {
-	font-family: sans-serif;
-	font-size:16px;
-	color:#1E4147;
-	background-color:#fafafa;
-}
-
-body {
-	max-width: 920px;
-	margin: 0 auto;
-	padding: 1rem 2rem;
-}
-
-p {
-    margin-bottom: 0;
-    padding-bottom: 0;
-}
-
-h1,h2,h3{
-	line-height:1.2;
-	color: #66f;
-}
-
-h1 {
-	text-align: center;
-	margin-bottom: 1em;
-}
-
-blockquote {
-	background-color: #eee;
-	border-left: 3px solid #444;
-	margin: 1rem -1rem 1rem calc(-1rem - 3px);
-	padding: 1rem;
-}
-
-ul {
-	margin-left: 0;
-	padding-left: 0;
-	padding-bottom: 1em;
-}
-
-li {
-	padding: 0;
-}
-
-li:not(:last-child) {
-	margin-bottom: 0.5rem;
-}
-
-a {
-	color:#66f;
-	text-decoration: none;
-}
-
-a:visited {
-	color: #802200;
-}
-
-pre {
-	background-color: #eee;
-	margin: 0 -1rem;
-	padding: 1rem;
-	overflow-x: auto;
-}
-
-@media(prefers-color-scheme:dark) {
-	html {
-		background-color: #111;
-		color: #eee;
-	}
-
-	blockquote {
-		background-color: #000;
-	}
-
-	pre {
-		background-color: #222;
-	}
-	
-	a {
-		color: #0087BD;
-	}
-
-	a:visited {
-		color: #802200;
-	}
-}
-
--- a/htmgem.php
+++ b/htmgem.php
@ -1,125 +0,0 @@
-<?php
-
-if (isset($_REQUEST["url"]))
-    $url = $_REQUEST["url"];
-elseif (isset($_SERVER["QUERY_STRING"]))
-    $url = "/".$_SERVER["QUERY_STRING"];
-else
-    $url = "/index.gmi";
-
-$GMI_DIR = $_SERVER['DOCUMENT_ROOT'];
-
-$filePath = $GMI_DIR.$url;
-$fileContent = @file_get_contents($filePath);
-if (!$fileContent) {
-    http_response_code(404);
-    die("404: $filePath $GMI_DIR $url");
-}
-
-$fileLines = preg_split("/\n/", $fileContent);
-
-ob_start();
-
-
-echo(<<<EOL
-<!DOCTYPE html>
-<html lang="fr">
-<head>
-    <title>HTM Gem</title>
-    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
-    <!-- link type="text/css" rel="StyleSheet" href="/htmgem.css" -->
-    <style>
-
-EOL
-);
-include("htmgem.css");
-echo(<<<EOL
-</style>
-</head>
-<body>
-EOL);
-
-$mode = null;
-foreach ($fileLines as $line) {
-    $reDo = true;
-    $line1 = substr($line, 0, 1);
-    $line2 = substr($line, 0, 2);
-    $line3 = substr($line, 0, 3);
-    while ($reDo) {
-        $reDo = false; # Change in modes need to redo one loop as they can’t handle the case
-        if (is_null($mode)) {
-            if (empty($line)) {
-                print("<p>&nbsp;</p>\n");
-            } elseif ("#" == $line1) {
-                preg_match("/^(#{1,3})\s*(.*)/", $line, $sharps);
-                $h_level = strlen($sharps[1]);
-                $text = $sharps[2];
-                switch ($h_level) {
-                    case 1: print("<h1>".$text."</h1>\n"); break;
-                    case 2: print("<h2>".$text."</h2>\n"); break;
-                    case 3: print("<h3>".$text."</h3>\n"); break;
-                }
-            } elseif ("=>" == $line2) {
-                preg_match("/^=>\s*([^\s]+)\s*(.*)$/", $line, $linkParts);
-                $url_link = $linkParts[1];
-                $url_label = $linkParts[2];
-                if (empty($url_label)) $url_label = $url_link;
-                print("<p><a href='".$url_link."'>".$url_label."</a></p>\n");
-            } elseif ("```" == $line3) {
-                $mode="pre";
-                print("<pre>\n");
-            } elseif (">" == $line1) {
-                $mode = "quote";
-                preg_match("/^>\s*(.*)$/", $line, $quoteParts);
-                $quote = $quoteParts[1];
-                print("<blockquote>\n");
-                if (empty($quote))
-                    print("<p>&nbsp;</p>\n");
-                else
-                    print("<p>".$quoteParts[1]."</p>\n");
-            } elseif ("*" == $line1) {
-                $mode = "ul";
-                $reDo = true;
-                print("<ul>\n");
-            } else {
-                print("<p>".$line."</p>\n");
-            }
-        } elseif ("pre"==$mode) {
-            if ("```" == $line3) {
-                $mode=null;
-                print("</pre>\n");
-            } else {
-                print($line."\n");
-            }
-        } elseif ("quote"==$mode) {
-            if (">" == $line1) {
-                preg_match("/^>\s*(.*)$/", $line, $quoteParts);
-                $quote = $quoteParts[1];
-                if (empty($quote))
-                    print("<p>&nbsp;</p>\n");
-                else
-                    print("<p>".$quote."</p>\n");
-            } else {
-                print("</blockquote>\n");
-                $mode=null;
-                $reDo=true;
-            }
-        } elseif ("ul"==$mode) {
-            if ("*" == $line1) {
-                preg_match("/^\*\s*(.*)$/", $line, $ulParts);
-                $li = $ulParts[1];
-                if (empty($li))
-                    print("<li>&nbsp;\n");
-                else
-                    print("<li>".$ulParts[1]."\n");
-            } else {
-                $mode = null;
-                print("</ul>\n");
-            }
-        }
-    }
-}
-
-ob_end_flush();
-
-?>
--- a/index.gmi
+++ b/index.gmi
@ -0,0 +1,19 @@
+```
+ __    __   __                   ______
+|\ \  |\ \ |\ \                 /\     \        v1.5.1
+| ▓▓  | ▓▓_| ▓▓_   ______ ____ |  ▓▓▓▓▓▓\
+| ▓▓__| ▓▓   ▓▓ \ |\     \    \| ▓▓ __\▓▓/\     \|      \    \
+| ▓▓    ▓▓\▓▓▓▓▓▓ | ▓▓▓▓▓▓\▓▓▓▓\ ▓▓|    \  ▓▓▓▓▓▓\ ▓▓▓▓▓▓\▓▓▓▓\
+| ▓▓▓▓▓▓▓▓ | ▓▓ __| ▓▓ | ▓▓ | ▓▓ ▓▓ \▓▓▓▓ ▓▓    ▓▓ ▓▓ | ▓▓ | ▓▓
+| ▓▓  | ▓▓ | ▓▓|  \ ▓▓ | ▓▓ | ▓▓ ▓▓__| ▓▓ ▓▓▓▓▓▓▓▓ ▓▓ | ▓▓ | ▓▓
+| ▓▓  | ▓▓  \▓▓  ▓▓ ▓▓ | ▓▓ | ▓▓\▓▓    ▓▓\▓▓     \ ▓▓ | ▓▓ | ▓▓
+ \▓▓   \▓▓   \▓▓▓▓ \▓▓  \▓▓  \▓▓ \▓▓▓▓▓▓  \▓▓▓▓▓▓▓\▓▓  \▓▓  \▓▓
+
+```
+
+=> docs/installation-fr.gmi 🇫🇷 HtmGem rend vos pages **Gemini** accessibles sur le web.
+
+=> docs/installation-en.gmi 🇺🇸 HtmGem makes your **Gemini** pages reachable on the web.
+
+
+=> docs/index.gmi 🇺🇸 **Documentation**
--- a/index.html
+++ b/index.html
@ -0,0 +1,19 @@
+<!DOCTYPE html>
+<html>
+<head>
+<title>HtmGem</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<link type='text/css' rel='StyleSheet' href='css/default/htmgem.css'>
+</head>
+<body>
+<h1>HtmGem</h1>
+<p>&nbsp;</p>
+<h3>Php required</h3>
+<p><a class='http' href='index.php'>🔄 index.php</a></p>
+<p>&nbsp;</p>
+<p>&nbsp;</p>
+<p>&nbsp;</p>
+<h3>Help</h3>
+<p><a class='https' href='https://gmi.sbgodin.fr/htmgem'>https://gmi.sbgodin.fr/htmgem</a></p>
+</body>
+</html>
--- a/index.php
+++ b/index.php
@ -0,0 +1,134 @@
+<?php declare(strict_types=1);
+
+require_once "lib-htmgem.inc.php";
+require_once "lib-html.inc.php";
+require_once "lib-io.inc.php";
+
+define("DEFAULT_CSS", "/css/default/htmgem.css");
+
+$documentRoot = $_SERVER['DOCUMENT_ROOT'];
+$scheme = (@$_SERVER['REQUEST_SCHEME']??"http")."://";
+$domain = $_SERVER['HTTP_HOST'];
+$php_self = $_SERVER['PHP_SELF']; // by default: /htmgem/index.php
+$php_self_dir = dirname($php_self);
+$url = @$_REQUEST["url"];
+$urlRewriting = @$_REQUEST["rw"]=="1";
+
+/**
+ * Installation page
+ *
+ * Accessing directly /htmgem will make display the self-hosted documentation
+ * contained in "index.gmi". If it's removed, display an empty page with a
+ * comment
+ */
+if (empty($url)) {
+    if (!file_exists("index.gmi")) {
+        http_response_code(403);
+    } else {
+        $gt_html = new \htmgem\GemTextTranslate_html(file_get_contents("index.gmi"), true, "$php_self?url=", $php_self_dir);
+        if (empty($gt_html->getCss())) $gt_html->addCss($php_self_dir.DEFAULT_CSS);
+
+        // No URL Rewritting assumed
+        echo \htmgem\html\getHtmlWithMenu($gt_html, $scheme, $domain, $php_self, "$php_self?url=");
+    }
+    exit();
+}
+
+$url = \htmgem\resolve_path(
+    // Some webservers (Apache) don't add the slash
+    // while others (Nginx) do…
+    ( $url[0] == "/" ? "" : "/" ) . $url
+);
+if (!preg_match("/\.gmi$/", $url)) {
+    if ($url[-1] == "/")
+        $url = $url."index.gmi";
+    else
+        $url = $url."/index.gmi";
+}
+
+# Removes the headling and trailling slashes, to be sure there's not any.
+$filePath = rtrim($_SERVER['DOCUMENT_ROOT'], "/")."/".ltrim($url, "/");
+
+switch(true) {
+    case !realPath($filePath):
+    case strpos($filePath, $documentRoot)!==0: # not in web directory
+        $go404 = true;
+        // Says 404 even if the file exists to not give any information.
+        break;
+    default:
+        $go404 = false;
+}
+
+/* 404 page
+ */
+if ($go404) {
+    error_log("HtmGem: 404 $url $filePath");
+    http_response_code(404);
+    $page404 = \htmgem\html\get404GmiPage($url);
+    $gt_html = new \htmgem\GemTextTranslate_html($page404);
+    if (empty($gt_html->getCss())) $gt_html->addCss($php_self_dir."/css/htmgem.css");
+    if ($urlRewriting)
+        echo \htmgem\html\getHtmlWithMenu($gt_html, $scheme, $domain, $url);
+    else
+        echo \htmgem\html\getHtmlWithMenu($gt_html, $scheme, $domain, $url, "$php_self?url=");
+    exit();
+}
+
+# to false only if textDecoration=0 in the URL
+$gt_htmlextDecoration = "0" != @$_REQUEST['textDecoration'];
+
+$fileContents = @file_get_contents($filePath);
+\htmgem\io\convertToUTF8($fileContents);
+
+/* CSS and special style management
+ */
+
+$style = @$_REQUEST['style'];
+if ("source" == $style) {
+    $basename = basename($filePath);
+    header("Cache-Control: public");
+    header("Content-Disposition: attachment; filename=$basename");
+    header("Content-Type: text/plain");
+    header("Content-Transfer-Encoding: binary");
+    header('Content-Length: ' . filesize($filePath));
+    echo $fileContents;
+    exit();
+} elseif ("src" == $style) {
+    # Gets the page title: the first occurrence with # at the line start
+    mb_ereg("#\s*([^\n]+)\n", $fileContents, $matches);
+    $page_title = @$matches[1];
+    $fileContents = htmlspecialchars($fileContents, ENT_HTML5|ENT_QUOTES, "UTF-8", true);
+    echo <<<EOL
+<!DOCTYPE html>
+<html>
+<head>
+<title>$page_title</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+</head>
+<pre>
+$fileContents</pre>
+</body>
+</html>
+EOL;
+    exit();
+}
+
+if ($urlRewriting)
+    $gt_html = new \htmgem\GemTextTranslate_html($fileContents, $gt_htmlextDecoration);
+else
+    $gt_html = new \htmgem\GemTextTranslate_html($fileContents, $gt_htmlextDecoration, "$php_self?url=", dirname($url));
+
+if (empty($style)) {
+    $gt_html->addCss($php_self_dir.DEFAULT_CSS);
+} else {
+    $style = preg_replace("/,/", "/", $style);
+    if ("/" == $php_self_dir) $php_self_dir = ""; # dirname() never use a final slash except for the root
+    $gt_html->addCss("$php_self_dir/css/$style");
+}
+
+if ($urlRewriting)
+    echo \htmgem\html\getHtmlWithMenu($gt_html, $scheme, $domain, $url);
+else
+    echo \htmgem\html\getHtmlWithMenu($gt_html, $scheme, $domain, $url, "$php_self?url=");
+
+?>
--- a/lib-htmgem.inc.php
+++ b/lib-htmgem.inc.php
@ -0,0 +1,424 @@
+<?php declare(strict_types=1);
+
+namespace htmgem;
+
+mb_internal_encoding("UTF-8");
+mb_regex_encoding("UTF-8");
+
+/**
+ * Resolve $path interpretating / . and ..
+ * @param $path str
+ * @returns "/" if .. goes above the limit
+ */
+function resolve_path($path) {
+    if (empty($path)) return "";
+    $absolute = "/"==$path[0];
+    $parts = array_filter(explode("/", $path), 'strlen');
+    $chuncks = array();
+    foreach ($parts as $part) {
+        if ('.' == $part) continue;
+        if ('..' == $part) {
+            if (is_null(array_pop($chuncks))) return "/";
+        } else {
+            $chuncks[] = $part;
+        }
+    }
+    $output = implode("/", $chuncks);
+    if ($absolute) $output = "/".$output;
+    return $output;
+}
+
+/**
+ * Splits link (without .. or .) into parts along with direct url access.
+ * @param url
+ *
+ * Ex. /dir1/dir2/page.gmi
+ * --> "dir1" --> "/dir1"
+ * --> "dir2" --> "/dir1/dir2"
+ * --> "page.gmi" --> "/dir2/page.gmi"
+ */
+function split_path_links($path, $prefix="") {
+    $parts = array_filter(explode("/", $path), 'strlen');
+    if (empty($parts)) return array();
+    if ("/"==$path[0])
+        $stack = "/";
+    else
+        $stack = "";
+    $output = array();
+    $slash = "";
+    foreach ($parts as $part) {
+        $stack .= $slash.$part;
+        $output[$part] = $prefix.$stack;
+        $slash = "/";
+    }
+    return $output;
+}
+
+/**
+ * Parses the gemtext and generates the internal format version
+ * @param str $fileContents the gemtext to parse
+ */
+function gemtextParser($fileContents) {
+    if (empty($fileContents)) return array();
+    $fileContents = rtrim($fileContents); // removes last empty line
+    $fileLines = mb_split("\n|\r\n?", $fileContents); // Unix, Mac, Windows line feeds
+    $mode = null;
+    $current = array();
+    foreach ($fileLines as $line) {
+        $reDoCount = 0;
+        $mode_textAttributes_temp = false;
+        while (true) {
+            /* The continue instruction is used to make another turn when there is a transition
+             * between two modes. */
+            if ($reDoCount>1) {
+                die("HtmGem: Too many loops, mode == '$mode'");
+            }
+            $reDoCount += 1;
+            $line1 = substr($line, 0, 1); // $line can be modified
+            $line2 = substr($line, 0, 2); // in the meantime.
+            $line3 = substr($line, 0, 3);
+            if (is_null($mode)) {
+                if ('^^^' == $line3) {
+                    yield array("mode" => "^^^");
+                } elseif ("#" == $line1) {
+                    preg_match("/^(#{1,3})\s*(.+)?/", $line, $matches);
+                    yield array("mode" => $matches[1], "title" => trim($matches[2]??""));
+                } elseif ("=>" == $line2) {
+                    preg_match("/^=>\s*([^\s]+)(?:\s+(.*))?$/", $line, $matches);
+                    yield array("mode" => "=>", "link" => trim($matches[1]??""), "text" => trim($matches[2]??""));
+                } elseif ("```" == $line3) {
+                    preg_match("/^```\s*(.*)$/", $line, $matches);
+                    $current = array("mode" => "```", "alt" => trim($matches[1]), "texts" => array());
+                    $mode="```";
+                } elseif (">" == $line1) {
+                    preg_match("/^>\s*(.*)$/", $line, $matches);
+                    $current = array("mode" => ">", "texts" => array(trim($matches[1])));
+                    $mode = ">";
+                } elseif ("*" == $line1) {
+                    preg_match("/^\*\s*(.*)$/", $line, $matches);
+                    $current = array("mode" => "*", "texts" => array(trim($matches[1])));
+                    $mode = "*";
+                } else {
+                    // text_line
+                    yield array("mode"=>"", "text" => rtrim($line));
+                }
+            } else {
+                if ("```"==$mode) {
+                    if ("```" == $line3) {
+                        yield $current;
+                        $current = array();
+                        $mode = null;
+                    } else {
+                        $current["texts"] []= rtrim($line); // No ltrim() as it’s a preformated text!
+                    }
+                } elseif (">"==$mode) {
+                    if (">" == $line1) {
+                        preg_match("/^>\s*(.*)$/", $line, $matches);
+                        $current["texts"] []= trim($matches[1]);
+                    } else {
+                        yield $current;
+                        $current = array();
+                        $mode = null;
+                        continue;
+                    }
+                } elseif ("*"==$mode) {
+                    if ("*" == $line1) {
+                        preg_match("/^\*\s*(.*)$/", $line, $matches);
+                        $current["texts"] []= trim($matches[1]);
+                    } else {
+                        yield $current;
+                        $current = array();
+                        $mode = null;
+                        continue;
+                    }
+                } else {
+                    die("Unexpected mode: $mode!");
+                }
+            }
+            break; // exits the while(true) as no continue occured
+        } // while(true)
+    }// foreach
+    if ($current) yield $current; # File ends before the block.
+} // gemtextParser
+
+
+/**
+ * Translates the internal format into a gemtext.
+ * Uses cases:
+ *
+ * - test suites
+ * - serialisation easier with a text content
+ * - normalization (trimming spaces for instance)
+ */
+class GemtextTranslate_gemtext {
+
+    function __construct($parsedGemtext) {
+        if (empty($parsedGemtext)) $parsedGemtext = "";
+        // to delete the last empty lines
+        $parsedGemtext = rtrim($parsedGemtext);
+        // The text must be parsed
+        $this->parsedGemtext = gemtextParser($parsedGemtext);
+        $this->translate();
+    }
+
+    protected function translate() {
+        $output = "";
+        foreach ($this->parsedGemtext as $node) {
+            $mode = $node["mode"];
+            switch($mode) {
+                case "":
+                    $output .= $node["text"]."\n";
+                    break;
+                case "*":
+                    foreach ($node["texts"] as $text) {
+                        $output .= "* $text\n";
+                    }
+                    break;
+                case "```":
+                    $alt = $node["alt"];
+                    if (empty($alt))
+                        $output .= "```\n";
+                    else
+                        $output .= "``` $alt\n";
+                    foreach ($node["texts"] as $text) {
+                        $output .= "$text\n";
+                    }
+                    $output .= "```\n";
+                    break;
+                case ">":
+                    foreach ($node["texts"] as $text) {
+                        if (empty($text))
+                            $output .= ">\n";
+                        else
+                            $output .= "> $text\n";
+                    }
+                    break;
+                case "=>":
+                    $linkText = $node["text"];
+                    $link = $node["link"];
+                    if (!empty($linkText)) $linkText = " $linkText";
+                    if (!empty($link)) $link = " $link";
+                    $output .= "=>".$link.$linkText."\n";
+                    break;
+                case "#":
+                case "##":
+                case "###":
+                    $output .= "$mode ".$node["title"]."\n";
+                    break;
+                case "^^^":
+                    $output .= "^^^\n";
+                    break;
+                default:
+                    die("Unknown mode: '{$node["mode"]}'\n");
+            }
+        }
+
+        $this->translatedGemtext = $output;
+    }
+
+    public function __toString() {
+        return $this->translatedGemtext;
+    }
+} // GemtextTranslate_gemtext
+
+
+/**
+ * Translates the internal format to HTML
+ */
+class GemtextTranslate_html {
+
+    protected $cssList = array();
+    protected $pageTitle = "";
+    public $translatedGemtext;
+
+    /**
+     * @param $parsedGemtext the gemtext internal format
+     * @param $textDecoration bool to interpret or not the text decoration
+     * @param $urlPrefix the prefix to prepend if the URL rewriting is not on
+     * @param $currentPageDir the current directory, to be used without URL rewriting
+     */
+    function __construct($parsedGemtext, $textDecoration=true, $urlPrefix=null, $currentPageDir=null) {
+        $this->urlPrefix = $urlPrefix;
+        $this->currentPageDir = $currentPageDir;
+        if (empty($parsedGemtext)) $parsedGemtext = "";
+        // to delete the last empty lines
+        $parsedGemtext = rtrim($parsedGemtext);
+        // The text must be parsed
+        $parsedGemtext = gemtextParser($parsedGemtext);
+        $this->parsedGemtext = $parsedGemtext;
+        $this->translate($textDecoration);
+    }
+
+    function addCss($css) {
+        $this->cssList []= $css;
+    }
+
+    function getCss() { return $this->cssList; }
+    function getTitle() { return $this->pageTitle; }
+
+    const NARROW_NO_BREAK_SPACE = "&#8239;";
+    const DASHES
+        ="‒" # U+2012 Figure Dash
+        ."–" # U+2013 En Dash
+        ."—" # U+2014 Em Dash
+        ."⸺" # U+2E3A Two-Em Dash
+        ."⸻" # U+2E3B Three-Em Dash (Three times larger than a single char)
+    ;
+
+    /**
+     * Replaces markups things like __underlined__ to <u>underlined</u>.
+     * @param $instruction the characters to replace, ex. _
+     * @param $markup the markup to replace to, ex. "u" to get <u>…</u>
+     * @param &$text where to replace.
+     */
+    protected static function markupPreg($instruction, $markup, &$text) {
+        $output = $text;
+
+        # Replaces couples "__word__" into "<i>word</i>".
+        $output = mb_ereg_replace("${instruction}(.+?)${instruction}", "<{$markup}>\\1</{$markup}>", $output);
+
+        # Replaces a remaining __ into "<i>…</i>" to the end of the line.
+        $output = mb_ereg_replace("${instruction}(.+)?", "<{$markup}>\\1</{$markup}>", $output);
+
+        $text = $output;
+    }
+
+    /**
+     * Adds text attributes sucj as underline, bold, … to $line
+     * @param $line the line to process
+     */
+    protected static function addTextDecoration(&$line) {
+        self::markupPreg("__",   "u",      $line);
+        self::markupPreg("\*\*", "strong", $line);
+        self::markupPreg("//",   "em",     $line);
+        self::markupPreg("~~",   "del",    $line);
+    }
+
+    /**
+     * Prepares the raw text to be displayed in HTML environment:
+     * * Escapes the HTML entities yet contained in the Gemtext.
+     * * Puts thin unbrakable spaces before some characters.
+     * @param $text1, $text2 texts to process
+     */
+    protected static function htmlPrepare(&$text) {
+        if (empty($text)) {
+            $text = "&nbsp;";
+        } else {
+            $text = htmlspecialchars($text, ENT_HTML5|ENT_QUOTES, "UTF-8", true);
+            $text = mb_ereg_replace("\ ([?!:;»€$])", self::NARROW_NO_BREAK_SPACE."\\1", $text);
+            $text = mb_ereg_replace("([«])\ ", "\\1".self::NARROW_NO_BREAK_SPACE, $text); # Espace fine insécable
+
+            # Warning: using a monospace font editor may not display dashes as they should be!
+            # Adds no-break spaces to stick the (EM/EN dashes) to words : aaaaaa – bb – ccccc ==> aaaaaa –$bb$– ccccc
+            $text = mb_ereg_replace("([".self::DASHES."]) ([^".self::DASHES.".]+) ([".self::DASHES."])", "\\1".self::NARROW_NO_BREAK_SPACE."\\2".self::NARROW_NO_BREAK_SPACE."\\3", $text);
+
+            # Adds no-break space to stick the (EM/EN dashes) to words : aaaaaa – bb. ==> aaaaaa –$bb.
+            $text = mb_ereg_replace("([—–]) ([^.]+)\.", "\\1".self::NARROW_NO_BREAK_SPACE."\\2.", $text);
+        }
+    }
+
+    protected static function spacesCompress(&$text) {
+        # Replaces several spaces (0x20) by only one
+        if (empty($text)) $text = "";
+        $text = preg_replace("/  +/", " ", $text);
+    }
+
+    public function translate($textDecoration=true) {
+        $output = "";
+        foreach ($this->parsedGemtext as $node) {
+            $mode = $node["mode"];
+            switch($mode) {
+                case "":
+                    $text = $node["text"];
+                    self::spacesCompress($text);
+                    self::htmlPrepare($text);
+                    if ($textDecoration) self::addTextDecoration($text);
+                    $output .= "<p>$text</p>\n";
+                    break;
+                case "*":
+                    $output .= "<ul>\n";
+                    foreach ($node["texts"] as $text) {
+                        self::spacesCompress($text);
+                        self::htmlPrepare($text);
+                        if ($textDecoration) self::addTextDecoration($text);
+                        $output .= "<li>$text\n";
+                    }
+                    $output .= "</ul>\n";
+                    break;
+                case "```":
+                    $text = implode("\n", $node["texts"]);
+                    self::htmlPrepare($text);
+                    $alt = $node["alt"];
+                    $output .= "<pre alt='$alt'>\n$text\n</pre>\n";
+                    break;
+                case ">":
+                    $output .= "<blockquote>\n";
+                    foreach ($node["texts"] as $text) {
+                        self::spacesCompress($text);
+                        self::htmlPrepare($text);
+                        if ($textDecoration) self::addTextDecoration($text);
+                        $output .= "<p>$text</p>\n";
+                    }
+                    $output .= "</blockquote>\n";
+                    break;
+                case "=>":
+                    $link = $node["link"];
+                    $linkText = $node["text"];
+                    if (empty($linkText)) {
+                        $linkText = $link;
+                        self::htmlPrepare($linkText);
+                    } else {
+                        self::spacesCompress($linkText);
+                        // Don't double encode, just escapes quotes, "<" and ">".
+                        // So "I'm&gt" becomes "I&apos;&gt". The & remains untouched.
+                        $link = htmlspecialchars($link, ENT_HTML5|ENT_QUOTES, "UTF-8", false);
+                        self::htmlPrepare($linkText);
+                        if ($textDecoration) self::addTextDecoration($linkText);
+                    }
+                    preg_match("/^([^:]+):/", $link, $matches);
+                    $protocol = @$matches[1]??"local";
+                    if ("local"==$protocol) {
+                        if (!is_null($this->urlPrefix)) { // No URL rewriting
+                            $link = $this->currentPageDir."/".$link;
+                            $link = resolve_path($link);
+                            $link = $this->urlPrefix.$link;
+                        }
+                        $newWindow = "";
+                    } else {
+                        $newWindow = "target='_blank' ";
+                    }
+                    $output .= "<p><a {$newWindow}class='$protocol' href='$link'>$linkText</a></p>\n";
+                    break;
+                case "#":
+                    $title = $node["title"];
+                    self::spacesCompress($linkText);
+                    self::htmlPrepare($title);
+                    if (empty($this->pageTitle)) $this->pageTitle = $title;
+                    $output .= "<h1>$title</h1>\n";
+                    break;
+                case "##":
+                    $title = $node["title"];
+                    self::spacesCompress($linkText);
+                    self::htmlPrepare($title);
+                    $output .= "<h2>$title</h2>\n";
+                    break;
+                case "###":
+                    $title = $node["title"];
+                    self::spacesCompress($linkText);
+                    self::htmlPrepare($title);
+                    $output .= "<h3>$title</h3>\n";
+                    break;
+                case "^^^":
+                    $textDecoration = !$textDecoration;
+                    break;
+                default:
+                    die("Unknown mode: '{$node["mode"]}'\n");
+            }
+        }
+
+        $this->translatedGemtext = $output;
+    }
+
+} // GemTextTranslate_html
+
+?>
--- a/lib-html.inc.php
+++ b/lib-html.inc.php
@ -0,0 +1,96 @@
+<?php declare(strict_types=1);
+
+namespace htmgem\html;
+
+mb_internal_encoding("UTF-8");
+mb_regex_encoding("UTF-8");
+
+define("TXT_ICON", "H͜͡m ");
+
+function getHeader(\htmgem\GemtextTranslate_html $gt_html) {
+    $css = $gt_html->getCss();
+    $output = <<<EOL
+<!DOCTYPE html>
+<html lang="">
+<head>
+<title>{$gt_html->getTitle()}</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+EOL;
+    foreach ($css as $c) {
+        $output .= "\n<link type='text/css' rel='StyleSheet' href='$c'>\n";
+    }
+    $output .= <<<EOL
+</head>
+
+EOL;
+    return $output;
+}
+
+function array_key_last_slice($array) {
+    // array_key_last() only available as of php v7.3.0
+    return key(array_slice($array, -1));
+}
+
+/**
+ * @param $url the full URL to display
+ * @param $pageLink if not null, means no URL rewritting
+ */
+function getMenu(string $scheme, string $domain, string $path, string $prefix=null) {
+    $links = \htmgem\split_path_links($path, $prefix);
+
+    // Removes the last part, as it won't hold a link
+    $lastLink = array_key_last_slice($links);
+    if ("index.gmi"==$lastLink) {
+        // removes the index page
+        array_pop($links);
+        $lastLink = array_key_last_slice($links);
+    }
+    array_pop($links);
+
+    $links = array($domain => "$prefix/") + $links;
+    $linkList = array();
+    foreach ($links as $label=>$link) {
+        $linkList []= "<a href='$link'>$label</a>\n";
+    }
+    $linkList [] = $lastLink."\n"; // The last part holds no link
+    $output = "<div class='menu-line'>\n";
+    $output .= "<strong><a class='logo' href='/htmgem'>".TXT_ICON."</a></strong>$scheme\n";
+    $output .= implode(" / ", $linkList);
+    $output .= "</div>\n";
+    return $output;
+}
+
+function getFooter(\htmgem\GemtextTranslate_html $gt_html) {
+    return "</body>\n</html>\n";
+}
+
+function getHtmlWithMenu($gt_html, $scheme, $domain, $path, $prefix=null) {
+    $menu = getMenu($scheme, $domain, $path, $prefix);
+    echo getHeader($gt_html);
+    echo "<body>\n";
+    echo "<div class='menu'>\n";
+    echo $menu;
+    echo "<hr>\n";
+    echo "</div>\n";
+    echo "<div id='gmi'>\n";
+    echo $gt_html->translatedGemtext;
+    echo "</div>\n";
+    echo "<div class='menu'>\n";
+    echo "<hr>\n";
+    echo $menu;
+    echo "</div>\n";
+    echo getFooter($gt_html);
+}
+
+function get404Gmipage($url) {
+    return <<<EOF
+# ⚠ $url ⚠
+
+* **Page non trouvée**
+* **Page not found**
+
+
+EOF;
+}
+
+?>
--- a/lib-io.inc.php
+++ b/lib-io.inc.php
@ -0,0 +1,35 @@
+<?php declare(strict_types=1);
+
+namespace htmgem\io;
+
+define("_BOMS", array( // Byte Order Mark
+// https://www.unicode.org/faq/utf_bom.html
+    "UTF-32LE" => "\xFF\xFE\x00\x00",
+    "UTF-16LE" => "\xFF\xFE",
+    "UTF-16BE" => "\xFE\xFF",
+    "UTF-8"     => "\xEF\xBB\xBF",
+    "UTF-32BE" => "\x00\x00\xFE\xFF"
+));
+
+/**
+ * Returns the encoding among Unicode ones, using the BOM
+ * @param txt $text
+ * @returns the encoding, or UTF-8 if no BOM read
+ */
+function _detectUnicodeEncoding(&$text) {
+    /* The PHP built-in function mb-detect-encoding()
+     * doesn't detect UTF-16.
+     */
+    foreach (_BOMS as $bomName => $bomBytes)
+        if (strpos($text, $bomBytes) === 0) return $bomName;
+    return "UTF-8";
+}
+
+/** Converts to UTF8 an Unicode text and removes the BOM
+ */
+function convertToUTF8(&$text) {
+    $encoding = _detectUnicodeEncoding($text);
+    $text = mb_convert_encoding($text, "UTF-8", $encoding);
+    $text = preg_replace("/^"._BOMS['UTF-8']."/", "", $text);
+    return $encoding;
+}
--- a/tests/cli/parse_gemtext.php
+++ b/tests/cli/parse_gemtext.php
@ -0,0 +1,9 @@
+<?php if (empty(@$_SERVER['SHELL']) or count($argv)<2) die();
+
+$fileName = $argv[1];
+
+require_once dirname(__FILE__)."/../../lib-htmgem.inc.php";
+
+$text = file_get_contents($fileName);
+$parsedGemtext = \htmgem\gemtextParser($text);
+print_r(iterator_to_array($parsedGemtext));
--- a/tests/cli/pre-commit.git
+++ b/tests/cli/pre-commit.git
@ -0,0 +1,27 @@
+#!/usr/bin/env php
+<?php
+
+/**
+ * This file is to be placed in ./.git/hooks
+ * and set as executable.
+ *
+ * It will perform tests before any commit.
+ * To override: git commit -n
+ */
+
+echo "Running tests…";
+exec('phpunit tests', $output, $returnCode);
+
+// Removes the first line
+array_shift($output);
+
+if ($returnCode !== 0) {
+  echo PHP_EOL . implode(PHP_EOL, $output) . PHP_EOL;
+  echo "Tests failed…" . PHP_EOL;
+  exit(1);
+}
+
+// Show summary (last line)
+echo array_pop($output) . PHP_EOL;
+
+exit(0);
--- a/tests/cli/translate_to_gemtext.php
+++ b/tests/cli/translate_to_gemtext.php
@ -0,0 +1,11 @@
+<?php if (empty(@$_SERVER['SHELL']) or count($argv)<2) die();
+
+$fileName = $argv[1];
+
+require_once dirname(__FILE__)."/../../lib-htmgem.inc.php";
+require_once dirname(__FILE__)."/../../lib-io.inc.php";
+
+$text = file_get_contents($fileName);
+\htmgem\io\convertToUTF8($text);
+$gt_gemtext = new \htmgem\GemtextTranslate_gemtext($text);
+echo strval($gt_gemtext);
--- a/tests/cli/translate_to_html.php
+++ b/tests/cli/translate_to_html.php
@ -0,0 +1,11 @@
+<?php if (empty(@$_SERVER['SHELL']) or count($argv)<2) die();
+
+$fileName = $argv[1];
+
+require_once dirname(__FILE__)."/../../lib-htmgem.inc.php";
+require_once dirname(__FILE__)."/../../lib-io.inc.php";
+
+$text = file_get_contents($fileName);
+\htmgem\io\convertToUTF8($text);
+$gt_html = new \htmgem\GemtextTranslate_html($text);
+echo $gt_html->translatedGemtext;
--- a/tests/files_with_html/linefeeds-utf-16.txt
+++ b/tests/files_with_html/linefeeds-utf-16.txt
--- a/tests/files_with_html/linefeeds-utf-16.txt.html
+++ b/tests/files_with_html/linefeeds-utf-16.txt.html
@ -0,0 +1,8 @@
+<ul>
+<li>unix
+<li>mac
+<li>dos
+<li>unix
+<li>mac
+<li>dos
+</ul>
--- a/tests/files_with_html/linefeeds.txt
+++ b/tests/files_with_html/linefeeds.txt
@ -0,0 +1,4 @@
+* unix
+* mac
* dos
+* unix
+* mac
* dos
--- a/tests/files_with_html/linefeeds.txt.html
+++ b/tests/files_with_html/linefeeds.txt.html
@ -0,0 +1,8 @@
+<ul>
+<li>unix
+<li>mac
+<li>dos
+<li>unix
+<li>mac
+<li>dos
+</ul>
--- a/tests/files_with_html/links-utf-16.gmi
+++ b/tests/files_with_html/links-utf-16.gmi
--- a/tests/files_with_html/links-utf-16.gmi.html
+++ b/tests/files_with_html/links-utf-16.gmi.html
@ -0,0 +1,3 @@
+<p><a class='local' href=''>&nbsp;</a></p>
+<p><a class='local' href='the_link'>the_link</a></p>
+<p><a class='local' href='the_link'>label</a></p>
--- a/tests/files_with_html/links.gmi
+++ b/tests/files_with_html/links.gmi
@ -0,0 +1,3 @@
+=>
+=> the_link
+=> the_link label
--- a/tests/files_with_html/links.gmi.html
+++ b/tests/files_with_html/links.gmi.html
@ -0,0 +1,3 @@
+<p><a class='local' href=''>&nbsp;</a></p>
+<p><a class='local' href='the_link'>the_link</a></p>
+<p><a class='local' href='the_link'>label</a></p>
--- a/tests/files_with_html/lists-utf-16.gmi
+++ b/tests/files_with_html/lists-utf-16.gmi
--- a/tests/files_with_html/lists-utf-16.gmi.html
+++ b/tests/files_with_html/lists-utf-16.gmi.html
@ -0,0 +1,24 @@
+<ul>
+<li>one
+</ul>
+<p>&nbsp;</p>
+<p>&nbsp;</p>
+<ul>
+<li>one
+<li>two
+</ul>
+<p>&nbsp;</p>
+<ul>
+<li>one
+<li>two
+<li>three
+</ul>
+<p>&nbsp;</p>
+<ul>
+<li>one
+<li>two
+</ul>
+<p>&nbsp;</p>
+<ul>
+<li>three
+</ul>
--- a/tests/files_with_html/lists.gmi
+++ b/tests/files_with_html/lists.gmi
@ -0,0 +1,14 @@
+* one
+
+
+* one
+* two
+
+* one
+* two
+* three
+
+* one
+* two
+
+* three
--- a/tests/files_with_html/lists.gmi.html
+++ b/tests/files_with_html/lists.gmi.html
@ -0,0 +1,24 @@
+<ul>
+<li>one
+</ul>
+<p>&nbsp;</p>
+<p>&nbsp;</p>
+<ul>
+<li>one
+<li>two
+</ul>
+<p>&nbsp;</p>
+<ul>
+<li>one
+<li>two
+<li>three
+</ul>
+<p>&nbsp;</p>
+<ul>
+<li>one
+<li>two
+</ul>
+<p>&nbsp;</p>
+<ul>
+<li>three
+</ul>
--- a/tests/files_with_html/preformated-utf-16.gmi
+++ b/tests/files_with_html/preformated-utf-16.gmi
--- a/tests/files_with_html/preformated-utf-16.gmi.html
+++ b/tests/files_with_html/preformated-utf-16.gmi.html
@ -0,0 +1,18 @@
+<pre alt='Text to not display but keep in memory'>
+Bellow here, an empty line
+
+</pre>
+<p>&nbsp;</p>
+<pre alt='X'>
+&nbsp;
+</pre>
+<p>&nbsp;</p>
+<pre alt=''>
+No
+alt
+text
+provided
+
+</pre>
+<p>&nbsp;</p>
+<p>The following line is the last one and opens a ``` section without closing it</p>
--- a/tests/files_with_html/preformated.gmi
+++ b/tests/files_with_html/preformated.gmi
@ -0,0 +1,17 @@
+``` Text to not display but keep in memory
+Bellow here, an empty line
+
+```
+
+``` X
+```
+
+```
+No
+alt
+text
+provided
+
+```
+
+The following line is the last one and opens a ``` section without closing it
--- a/tests/files_with_html/preformated.gmi.html
+++ b/tests/files_with_html/preformated.gmi.html
@ -0,0 +1,18 @@
+<pre alt='Text to not display but keep in memory'>
+Bellow here, an empty line
+
+</pre>
+<p>&nbsp;</p>
+<pre alt='X'>
+&nbsp;
+</pre>
+<p>&nbsp;</p>
+<pre alt=''>
+No
+alt
+text
+provided
+
+</pre>
+<p>&nbsp;</p>
+<p>The following line is the last one and opens a ``` section without closing it</p>
--- a/tests/files_with_html/quotes-utf-16.gmi
+++ b/tests/files_with_html/quotes-utf-16.gmi
--- a/tests/files_with_html/quotes-utf-16.gmi.html
+++ b/tests/files_with_html/quotes-utf-16.gmi.html
@ -0,0 +1,27 @@
+<p>This text is made of empty quotes&#8239;:</p>
+<blockquote>
+<p>&nbsp;</p>
+</blockquote>
+<p>&nbsp;</p>
+<p>Quotes with one word:</p>
+<blockquote>
+<p>one</p>
+</blockquote>
+<p>Quotes with two words:</p>
+<blockquote>
+<p>A B</p>
+</blockquote>
+<p>&nbsp;</p>
+<p>Several quotes&#8239;:</p>
+<blockquote>
+<p>1</p>
+<p>2</p>
+<p>3</p>
+</blockquote>
+<p>&nbsp;</p>
+<p>Quotes with an empty one in between:</p>
+<blockquote>
+<p>1</p>
+<p>&nbsp;</p>
+<p>3</p>
+</blockquote>
--- a/tests/files_with_html/quotes.gmi
+++ b/tests/files_with_html/quotes.gmi
@ -0,0 +1,17 @@
+This text is made of empty quotes :
+>
+
+Quotes with one word:
+> one
+Quotes with two words:
+> A B
+
+Several quotes :
+> 1
+> 2
+> 3
+
+Quotes with an empty one in between:
+> 1
+>
+> 3
--- a/tests/files_with_html/quotes.gmi.html
+++ b/tests/files_with_html/quotes.gmi.html
@ -0,0 +1,27 @@
+<p>This text is made of empty quotes&#8239;:</p>
+<blockquote>
+<p>&nbsp;</p>
+</blockquote>
+<p>&nbsp;</p>
+<p>Quotes with one word:</p>
+<blockquote>
+<p>one</p>
+</blockquote>
+<p>Quotes with two words:</p>
+<blockquote>
+<p>A B</p>
+</blockquote>
+<p>&nbsp;</p>
+<p>Several quotes&#8239;:</p>
+<blockquote>
+<p>1</p>
+<p>2</p>
+<p>3</p>
+</blockquote>
+<p>&nbsp;</p>
+<p>Quotes with an empty one in between:</p>
+<blockquote>
+<p>1</p>
+<p>&nbsp;</p>
+<p>3</p>
+</blockquote>
--- a/tests/files_with_html/text-utf-16.gmi
+++ b/tests/files_with_html/text-utf-16.gmi
--- a/tests/files_with_html/text-utf-16.gmi.html
+++ b/tests/files_with_html/text-utf-16.gmi.html
@ -0,0 +1,359 @@
+<p>NOTE: this page was downloaded on 2021-03-04 from the original capsule</p>
+<p><a target='_blank' class='gemini' href='gemini://gemini.circumlunar.space/docs/specification.gmi'>Gemini</a></p>
+<p><a target='_blank' class='https' href='https://gemini.circumlunar.space/docs/specification.gmi'>Web</a></p>
+<p><a class='local' href='specification_Gemini_v0.14.3.txt'>Copie locale texte</a></p>
+<p>&nbsp;</p>
+<h1>Project Gemini</h1>
+<p>&nbsp;</p>
+<h2>Speculative specification</h2>
+<p>&nbsp;</p>
+<p>v0.14.3, November 29th 2020</p>
+<p>&nbsp;</p>
+<p>This is an increasingly less rough sketch of an actual spec for Project Gemini. Although not finalised yet, further changes to the specification are likely to be relatively small. You can write code to this pseudo-specification and be confident that it probably won&apos;t become totally non-functional due to massive changes next week, but you are still urged to keep an eye on ongoing development of the protocol and make changes as required.</p>
+<p>&nbsp;</p>
+<p>This is provided mostly so that people can quickly get up to speed on what I&apos;m thinking without having to read lots and lots of old phlog posts and keep notes.</p>
+<p>&nbsp;</p>
+<p>Feedback on any part of this is extremely welcome, please email solderpunk@posteo.net.</p>
+<p>&nbsp;</p>
+<h1>1 Overview</h1>
+<p>&nbsp;</p>
+<p>Gemini is a client-server protocol featuring request-response transactions, broadly similar to gopher or HTTP. Connections are closed at the end of a single transaction and cannot be reused. When Gemini is served over TCP/IP, servers should listen on port 1965 (the first manned Gemini mission, Gemini 3, flew in March &apos;65). This is an unprivileged port, so it&apos;s very easy to run a server as a &quot;nobody&quot; user, even if e.g. the server is written in Go and so can&apos;t drop privileges in the traditional fashion.</p>
+<p>&nbsp;</p>
+<h2>1.1 Gemini transactions</h2>
+<p>&nbsp;</p>
+<p>There is one kind of Gemini transaction, roughly equivalent to a gopher request or a HTTP &quot;GET&quot; request. Transactions happen as follows:</p>
+<p>&nbsp;</p>
+<p>C: Opens connection</p>
+<p>S: Accepts connection</p>
+<p>C/S: Complete TLS handshake (see section 4)</p>
+<p>C: Validates server certificate (see 4.2)</p>
+<p>C: Sends request (one CRLF terminated line) (see section 2)</p>
+<p>S: Sends response header (one CRLF terminated line), closes connection</p>
+<p> under non-success conditions (see 3.1 and 3.2)</p>
+<p>S: Sends response body (text or binary data) (see 3.3)</p>
+<p>S: Closes connection</p>
+<p>C: Handles response (see 3.4)</p>
+<p>&nbsp;</p>
+<h2>1.2 Gemini URI scheme</h2>
+<p>&nbsp;</p>
+<p>Resources hosted via Gemini are identified using URIs with the scheme &quot;gemini&quot;. This scheme is syntactically compatible with the generic URI syntax defined in RFC 3986, but does not support all components of the generic syntax. In particular, the authority component is allowed and required, but its userinfo subcomponent is NOT allowed. The host subcomponent is required. The port subcomponent is optional, with a default value of 1965. The path, query and fragment components are allowed and have no special meanings beyond those defined by the generic syntax. Spaces in gemini URIs should be encoded as %20, not +.</p>
+<p>&nbsp;</p>
+<h1>2 Gemini requests</h1>
+<p>&nbsp;</p>
+<p>Gemini requests are a single CRLF-terminated line with the following structure:</p>
+<p>&nbsp;</p>
+<p>&lt;URL&gt;&lt;CR&gt;&lt;LF&gt;</p>
+<p>&nbsp;</p>
+<p>&lt;URL&gt; is a UTF-8 encoded absolute URL, including a scheme, of maximum length 1024 bytes.</p>
+<p>&nbsp;</p>
+<p>Sending an absolute URL instead of only a path or selector is effectively equivalent to building in a HTTP &quot;Host&quot; header. It permits virtual hosting of multiple Gemini domains on the same IP address. It also allows servers to optionally act as proxies. Including schemes other than &quot;gemini&quot; in requests allows servers to optionally act as protocol-translating gateways to e.g. fetch gopher resources over Gemini. Proxying is optional and the vast majority of servers are expected to only respond to requests for resources at their own domain(s).</p>
+<p>&nbsp;</p>
+<h1>3 Gemini responses</h1>
+<p>&nbsp;</p>
+<p>Gemini response consist of a single CRLF-terminated header line, optionally followed by a response body.</p>
+<p>&nbsp;</p>
+<h2>3.1 Response headers</h2>
+<p>&nbsp;</p>
+<p>Gemini response headers look like this:</p>
+<p>&nbsp;</p>
+<p>&lt;STATUS&gt;&lt;SPACE&gt;&lt;META&gt;&lt;CR&gt;&lt;LF&gt;</p>
+<p>&nbsp;</p>
+<p>&lt;STATUS&gt; is a two-digit numeric status code, as described below in 3.2 and in Appendix 1.</p>
+<p>&nbsp;</p>
+<p>&lt;SPACE&gt; is a single space character, i.e. the byte 0x20.</p>
+<p>&nbsp;</p>
+<p>&lt;META&gt; is a UTF-8 encoded string of maximum length 1024 bytes, whose meaning is &lt;STATUS&gt; dependent.</p>
+<p>&nbsp;</p>
+<p>&lt;STATUS&gt; and &lt;META&gt; are separated by a single space character.</p>
+<p>&nbsp;</p>
+<p>If &lt;STATUS&gt; does not belong to the &quot;SUCCESS&quot; range of codes, then the server MUST close the connection after sending the header and MUST NOT send a response body.</p>
+<p>&nbsp;</p>
+<p>If a server sends a &lt;STATUS&gt; which is not a two-digit number or a &lt;META&gt; which exceeds 1024 bytes in length, the client SHOULD close the connection and disregard the response header, informing the user of an error.</p>
+<p>&nbsp;</p>
+<h2>3.2 Status codes</h2>
+<p>&nbsp;</p>
+<p>Gemini uses two-digit numeric status codes. Related status codes share the same first digit. Importantly, the first digit of Gemini status codes do not group codes into vague categories like &quot;client error&quot; and &quot;server error&quot; as per HTTP. Instead, the first digit alone provides enough information for a client to determine how to handle the response. By design, it is possible to write a simple but feature complete client which only looks at the first digit. The second digit provides more fine-grained information, for unambiguous server logging, to allow writing comfier interactive clients which provide a slightly more streamlined user interface, and to allow writing more robust and intelligent automated clients like content aggregators, search engine crawlers, etc.</p>
+<p>&nbsp;</p>
+<p>The first digit of a response code unambiguously places the response into one of six categories, which define the semantics of the &lt;META&gt; line.</p>
+<p>&nbsp;</p>
+<h3>3.2.1 1x (INPUT)</h3>
+<p>&nbsp;</p>
+<p>Status codes beginning with 1 are INPUT status codes, meaning:</p>
+<p>&nbsp;</p>
+<p>The requested resource accepts a line of textual user input. The &lt;META&gt; line is a prompt which should be displayed to the user. The same resource should then be requested again with the user&apos;s input included as a query component. Queries are included in requests as per the usual generic URL definition in RFC3986, i.e. separated from the path by a&#8239;?. Reserved characters used in the user&apos;s input must be &quot;percent-encoded&quot; as per RFC3986, and space characters should also be percent-encoded.</p>
+<p>&nbsp;</p>
+<h3>3.2.2 2x (SUCCESS)</h3>
+<p>&nbsp;</p>
+<p>Status codes beginning with 2 are SUCCESS status codes, meaning:</p>
+<p>&nbsp;</p>
+<p>The request was handled successfully and a response body will follow the response header. The &lt;META&gt; line is a MIME media type which applies to the response body.</p>
+<p>&nbsp;</p>
+<h3>3.2.3 3x (REDIRECT)</h3>
+<p>&nbsp;</p>
+<p>Status codes beginning with 3 are REDIRECT status codes, meaning:</p>
+<p>&nbsp;</p>
+<p>The server is redirecting the client to a new location for the requested resource. There is no response body. &lt;META&gt; is a new URL for the requested resource. The URL may be absolute or relative. The redirect should be considered temporary, i.e. clients should continue to request the resource at the original address and should not performance convenience actions like automatically updating bookmarks. There is no response body.</p>
+<p>&nbsp;</p>
+<h3>3.2.4 4x (TEMPORARY FAILURE)</h3>
+<p>&nbsp;</p>
+<p>Status codes beginning with 4 are TEMPORARY FAILURE status codes, meaning:</p>
+<p>&nbsp;</p>
+<p>The request has failed. There is no response body. The nature of the failure is temporary, i.e. an identical request MAY succeed in the future. The contents of &lt;META&gt; may provide additional information on the failure, and should be displayed to human users.</p>
+<p>&nbsp;</p>
+<h3>3.2.5 5x (PERMANENT FAILURE)</h3>
+<p>&nbsp;</p>
+<p>Status codes beginning with 5 are PERMANENT FAILURE status codes, meaning:</p>
+<p>&nbsp;</p>
+<p>The request has failed. There is no response body. The nature of the failure is permanent, i.e. identical future requests will reliably fail for the same reason. The contents of &lt;META&gt; may provide additional information on the failure, and should be displayed to human users. Automatic clients such as aggregators or indexing crawlers should not repeat this request.</p>
+<p>&nbsp;</p>
+<h3>3.2.6 6x (CLIENT CERTIFICATE REQUIRED)</h3>
+<p>&nbsp;</p>
+<p>Status codes beginning with 6 are CLIENT CERTIFICATE REQUIRED status codes, meaning:</p>
+<p>&nbsp;</p>
+<p>The requested resource requires a client certificate to access. If the request was made without a certificate, it should be repeated with one. If the request was made with a certificate, the server did not accept it and the request should be repeated with a different certificate. The contents of &lt;META&gt; (and/or the specific 6x code) may provide additional information on certificate requirements or the reason a certificate was rejected.</p>
+<p>&nbsp;</p>
+<h3>3.2.7 Notes</h3>
+<p>&nbsp;</p>
+<p>Note that for basic interactive clients for human use, errors 4 and 5 may be effectively handled identically, by simply displaying the contents of &lt;META&gt; under a heading of &quot;ERROR&quot;. The temporary/permanent error distinction is primarily relevant to well-behaving automated clients. Basic clients may also choose not to support client-certificate authentication, in which case only four distinct status handling routines are required (for statuses beginning with 1, 2, 3 or a combined 4-or-5).</p>
+<p>&nbsp;</p>
+<p>The full two-digit system is detailed in Appendix 1. Note that for each of the six valid first digits, a code with a second digit of zero corresponds is a generic status of that kind with no special semantics. This means that basic servers without any advanced functionality need only be able to return codes of 10, 20, 30, 40 or 50.</p>
+<p>&nbsp;</p>
+<p>The Gemini status code system has been carefully designed so that the increased power (and correspondingly increased complexity) of the second digits is entirely &quot;opt-in&quot; on the part of both servers and clients.</p>
+<p>&nbsp;</p>
+<h2>3.3 Response bodies</h2>
+<p>&nbsp;</p>
+<p>Response bodies are just raw content, text or binary, ala gopher. There is no support for compression, chunking or any other kind of content or transfer encoding. The server closes the connection after the final byte, there is no &quot;end of response&quot; signal like gopher&apos;s lonely dot.</p>
+<p>&nbsp;</p>
+<p>Response bodies only accompany responses whose header indicates a SUCCESS status (i.e. a status code whose first digit is 2). For such responses, &lt;META&gt; is a MIME media type as defined in RFC 2046.</p>
+<p>&nbsp;</p>
+<p>Internet media types are registered with a canonical form. Content transferred via Gemini MUST be represented in the appropriate canonical form prior to its transmission except for &quot;text&quot; types, as defined in the next paragraph.</p>
+<p>&nbsp;</p>
+<p>When in canonical form, media subtypes of the &quot;text&quot; type use CRLF as the text line break. Gemini relaxes this requirement and allows the transport of text media with plain LF alone (but NOT a plain CR alone) representing a line break when it is done consistently for an entire response body. Gemini clients MUST accept CRLF and bare LF as being representative of a line break in text media received via Gemini.</p>
+<p>&nbsp;</p>
+<p>If a MIME type begins with &quot;text/&quot; and no charset is explicitly given, the charset should be assumed to be UTF-8. Compliant clients MUST support UTF-8-encoded text/* responses. Clients MAY optionally support other encodings. Clients receiving a response in a charset they cannot decode SHOULD gracefully inform the user what happened instead of displaying garbage.</p>
+<p>&nbsp;</p>
+<p>If &lt;META&gt; is an empty string, the MIME type MUST default to &quot;text/gemini; charset=utf-8&quot;. The text/gemini media type is defined in section 5.</p>
+<p>&nbsp;</p>
+<h2>3.4 Response body handling</h2>
+<p>&nbsp;</p>
+<p>Response handling by clients should be informed by the provided MIME type information. Gemini defines one MIME type of its own (text/gemini) whose handling is discussed below in section 5. In all other cases, clients should do &quot;something sensible&quot; based on the MIME type. Minimalistic clients might adopt a strategy of printing all other text/* responses to the screen without formatting and saving all non-text responses to the disk. Clients for unix systems may consult /etc/mailcap to find installed programs for handling non-text types.</p>
+<p>&nbsp;</p>
+<h1>4 TLS</h1>
+<p>&nbsp;</p>
+<p>Use of TLS for Gemini transactions is mandatory.</p>
+<p>&nbsp;</p>
+<p>Use of the Server Name Indication (SNI) extension to TLS is also mandatory, to facilitate name-based virtual hosting.</p>
+<p>&nbsp;</p>
+<h2>4.1 Version requirements</h2>
+<p>&nbsp;</p>
+<p>Servers MUST use TLS version 1.2 or higher and SHOULD use TLS version 1.3 or higher. TLS 1.2 is reluctantly permitted for now to avoid drastically reducing the range of available implementation libraries. Hopefully TLS 1.3 or higher can be specced in the near future. Clients who wish to be &quot;ahead of the curve MAY refuse to connect to servers using TLS version 1.2 or lower.</p>
+<p>&nbsp;</p>
+<h2>4.2 Server certificate validation</h2>
+<p>&nbsp;</p>
+<p>Clients can validate TLS connections however they like (including not at all) but the strongly RECOMMENDED approach is to implement a lightweight &quot;TOFU&quot; certificate-pinning system which treats self-signed certificates as first- class citizens. This greatly reduces TLS overhead on the network (only one cert needs to be sent, not a whole chain) and lowers the barrier to entry for setting up a Gemini site (no need to pay a CA or setup a Let&apos;s Encrypt cron job, just make a cert and go).</p>
+<p>&nbsp;</p>
+<p>TOFU stands for &quot;Trust On First Use&quot; and is public-key security model similar to that used by OpenSSH. The first time a Gemini client connects to a server, it accepts whatever certificate it is presented. That certificate&apos;s fingerprint and expiry date are saved in a persistent database (like the .known_hosts file for SSH), associated with the server&apos;s hostname. On all subsequent connections to that hostname, the received certificate&apos;s fingerprint is computed and compared to the one in the database. If the certificate is not the one previously received, but the previous certificate&apos;s expiry date has not passed, the user is shown a warning, analogous to the one web browser users are shown when receiving a certificate without a signature chain leading to a trusted CA.</p>
+<p>&nbsp;</p>
+<p>This model is by no means perfect, but it is not awful and is vastly superior to just accepting self-signed certificates unconditionally.</p>
+<p>&nbsp;</p>
+<h2>4.3 Client certificates</h2>
+<p>&nbsp;</p>
+<p>Although rarely seen on the web, TLS permits clients to identify themselves to servers using certificates, in exactly the same way that servers traditionally identify themselves to the client. Gemini includes the ability for servers to request in-band that a client repeats a request with a client certificate. This is a very flexible, highly secure but also very simple notion of client identity with several applications:</p>
+<p>&nbsp;</p>
+<ul>
+<li>Short-lived client certificates which are generated on demand and deleted immediately after use can be used as &quot;session identifiers&quot; to maintain server-side state for applications. In this role, client certificates act as a substitute for HTTP cookies, but unlike cookies they are generated voluntarily by the client, and once the client deletes a certificate and its matching key, the server cannot possibly &quot;resurrect&quot; the same value later (unlike so-called &quot;super cookies&quot;).
+<li>Long-lived client certificates can reliably identify a user to a multi-user application without the need for passwords which may be brute-forced. Even a stolen database table mapping certificate hashes to user identities is not a security risk, as rainbow tables for certificates are not feasible.
+<li>Self-hosted, single-user applications can be easily and reliably secured in a manner familiar from OpenSSH: the user generates a self-signed certificate and adds its hash to a server-side list of permitted certificates, analogous to the .authorized_keys file for SSH).
+</ul>
+<p>&nbsp;</p>
+<p>Gemini requests will typically be made without a client certificate. If a requested resource requires a client certificate and one is not included in a request, the server can respond with a status code of 60, 61 or 62 (see Appendix 1 below for a description of all status codes related to client certificates). A client certificate which is generated or loaded in response to such a status code has its scope bound to the same hostname as the request URL and to all paths below the path of the request URL path. E.g. if a request for gemini:<em>example.com/foo returns status 60 and the user chooses to generate a new client certificate in response to this, that same certificate should be used for subsequent requests to gemini:</em>example.com/foo, gemini:<em>example.com/foo/bar/, gemini:</em>example.com/foo/bar/baz, etc., until such time as the user decides to delete the certificate or to temporarily deactivate it. Interactive clients for human users are strongly recommended to make such actions easy and to generally give users full control over the use of client certificates.</p>
+<p>&nbsp;</p>
+<h1>5 The text/gemini media type</h1>
+<p>&nbsp;</p>
+<h2>5.1 Overview</h2>
+<p>&nbsp;</p>
+<p>In the same sense that HTML is the &quot;native&quot; response format of HTTP and plain text is the native response format of gopher, Gemini defines its own native response format - though of course, thanks to the inclusion of a MIME type in the response header Gemini can be used to serve plain text, rich text, HTML, Markdown, LaTeX, etc.</p>
+<p>&nbsp;</p>
+<p>Response bodies of type &quot;text/gemini&quot; are a kind of lightweight hypertext format, which takes inspiration from gophermaps and from Markdown. The format permits richer typographic possibilities than the plain text of Gopher, but remains extremely easy to parse. The format is line-oriented, and a satisfactory rendering can be achieved with a single pass of a document, processing each line independently. As per gopher, links can only be displayed one per line, encouraging neat, list-like structure.</p>
+<p>&nbsp;</p>
+<p>Similar to how the two-digit Gemini status codes were designed so that simple clients can function correctly while ignoring the second digit, the text/gemini format has been designed so that simple clients can ignore the more advanced features and still remain very usable.</p>
+<p>&nbsp;</p>
+<h2>5.2 Parameters</h2>
+<p>&nbsp;</p>
+<p>As a subtype of the top-level media type &quot;text&quot;, &quot;text/gemini&quot; inherits the &quot;charset&quot; parameter defined in RFC 2046. However, as noted in 3.3, the default value of &quot;charset&quot; is &quot;UTF-8&quot; for &quot;text&quot; content transferred via Gemini.</p>
+<p>&nbsp;</p>
+<p>A single additional parameter specific to the &quot;text/gemini&quot; subtype is defined: the &quot;lang&quot; parameter. The value of &quot;lang&quot; denotes the natural language or language(s) in which the textual content of a &quot;text/gemini&quot; document is written. The presence of the &quot;lang&quot; parameter is optional. When the &quot;lang&quot; parameter is present, its interpretation is defined entirely by the client. For example, clients which use text-to-speech technology to make Gemini content accessible to visually impaired users may use the value of &quot;lang&quot; to improve pronunciation of content. Clients which render text to a screen may use the value of &quot;lang&quot; to determine whether text should be displayed left-to-right or right-to-left. Simple clients for users who only read languages written left-to-right may simply ignore the value of &quot;lang&quot;. When the &quot;lang&quot; parameter is not present, no default value should be assumed and clients which require some notion of a language in order to process the content (such as text-to-speech screen readers) should rely on user-input to determine how to proceed in the absence of a &quot;lang&quot; parameter.</p>
+<p>&nbsp;</p>
+<p>Valid values for the &quot;lang&quot; parameter are comma-separated lists of one or more language tags as defined in RFC4646. For example:</p>
+<p>&nbsp;</p>
+<ul>
+<li>&quot;text/gemini; lang=en&quot; Denotes a text/gemini document written in English
+<li>&quot;text/gemini; lang=fr&quot; Denotes a text/gemini document written in French
+<li>&quot;text/gemini; lang=en,fr&quot; Denotes a text/gemini document written in a mixture of English and French
+<li>&quot;text/gemini; lang=de-CH&quot; Denotes a text/gemini document written in Swiss German
+<li>&quot;text/gemini; lang=sr-Cyrl&quot; Denotes a text/gemini document written in Serbian using the Cyrllic script
+<li>&quot;text/gemini; lang=zh-Hans-CN&quot; Denotes a text/gemini document written in Chinese using the Simplified script as used in mainland China
+</ul>
+<p>&nbsp;</p>
+<h2>5.3 Line-orientation</h2>
+<p>&nbsp;</p>
+<p>As mentioned, the text/gemini format is line-oriented. Each line of a text/gemini document has a single &quot;line type&quot;. It is possible to unambiguously determine a line&apos;s type purely by inspecting its first three characters. A line&apos;s type determines the manner in which it should be presented to the user. Any details of presentation or rendering associated with a particular line type are strictly limited in scope to that individual line.</p>
+<p>&nbsp;</p>
+<p>There are 7 different line types in total. However, a fully functional and specification compliant Gemini client need only recognise and handle 4 of them - these are the &quot;core line types&quot;, (see 5.4). Advanced clients can also handle the additional &quot;advanced line types&quot; (see 5.5). Simple clients can treat all advanced line types as equivalent to one of the core line types and still offer an adequate user experience.</p>
+<p>&nbsp;</p>
+<h2>5.4 Core line types</h2>
+<p>&nbsp;</p>
+<p>The four core line types are:</p>
+<p>&nbsp;</p>
+<h3>5.4.1 Text lines</h3>
+<p>&nbsp;</p>
+<p>Text lines are the most fundamental line type - any line which does not match the definition of another line type defined below defaults to being a text line. The majority of lines in a typical text/gemini document will be text lines.</p>
+<p>&nbsp;</p>
+<p>Text lines should be presented to the user, after being wrapped to the appropriate width for the client&apos;s viewport (see below). Text lines may be presented to the user in a visually pleasing manner for general reading, the precise meaning of which is at the client&apos;s discretion. For example, variable width fonts may be used, spacing may be normalised, with spaces between sentences being made wider than spacing between words, and other such typographical niceties may be applied. Clients may permit users to customise the appearance of text lines by altering the font, font size, text and background colour, etc. Authors should not expect to exercise any control over the precise rendering of their text lines, only of their actual textual content. Content such as ASCII art, computer source code, etc. which may appear incorrectly when treated as such should be enclosed between preformatting toggle lines (see 5.4.3).</p>
+<p>&nbsp;</p>
+<p>Blank lines are instances of text lines and have no special meaning. They should be rendered individually as vertical blank space each time they occur. In this way they are analogous to &lt;br/&gt; tags in HTML. Consecutive blank lines should NOT be collapsed into a fewer blank lines. Note also that consecutive non-blank text lines do not form any kind of coherent unit or block such as a &quot;paragraph&quot;: all text lines are independent entities.</p>
+<p>&nbsp;</p>
+<p>Text lines which are longer than can fit on a client&apos;s display device SHOULD be &quot;wrapped&quot; to fit, i.e. long lines should be split (ideally at whitespace or at hyphens) into multiple consecutive lines of a device-appropriate width. This wrapping is applied to each line of text independently. Multiple consecutive lines which are shorter than the client&apos;s display device MUST NOT be combined into fewer, longer lines.</p>
+<p>&nbsp;</p>
+<p>In order to take full advantage of this method of text formatting, authors of text/gemini content SHOULD avoid hard-wrapping to a specific fixed width, in contrast to the convention in Gopherspace where text is typically wrapped at 80 characters or fewer. Instead, text which should be displayed as a contiguous block should be written as a single long line. Most text editors can be configured to &quot;soft-wrap&quot;, i.e. to write this kind of file while displaying the long lines wrapped at word boundaries to fit the author&apos;s display device.</p>
+<p>&nbsp;</p>
+<p>Authors who insist on hard-wrapping their content MUST be aware that the content will display neatly on clients whose display device is as wide as the hard-wrapped length or wider, but will appear with irregular line widths on narrower clients.</p>
+<p>&nbsp;</p>
+<h3>5.4.2 Link lines</h3>
+<p>&nbsp;</p>
+<p>Lines beginning with the two characters &quot;=&gt;&quot; are link lines, which have the following syntax:</p>
+<p>&nbsp;</p>
+<pre alt=''>
+=&gt;[&lt;whitespace&gt;]&lt;URL&gt;[&lt;whitespace&gt;&lt;USER-FRIENDLY LINK NAME&gt;]
+</pre>
+<p>&nbsp;</p>
+<p>where:</p>
+<p>&nbsp;</p>
+<ul>
+<li>&lt;whitespace&gt; is any non-zero number of consecutive spaces or tabs
+<li>Square brackets indicate that the enclosed content is optional.
+<li>&lt;URL&gt; is a URL, which may be absolute or relative.
+</ul>
+<p>&nbsp;</p>
+<p>All the following examples are valid link lines:</p>
+<p>&nbsp;</p>
+<pre alt=''>
+=&gt; gemini://example.org/
+=&gt; gemini://example.org/ An example link
+=&gt; gemini://example.org/foo	Another example link at the same host
+=&gt; foo/bar/baz.txt	A relative link
+=&gt; 	gopher://example.org:70/1 A gopher link
+</pre>
+<p>&nbsp;</p>
+<p>URLs in link lines must have reserved characters and spaces percent-encoded as per RFC 3986.</p>
+<p>&nbsp;</p>
+<p>Note that link URLs may have schemes other than gemini. This means that Gemini documents can simply and elegantly link to documents hosted via other protocols, unlike gophermaps which can only link to non-gopher content via a non-standard adaptation of the `h` item-type.</p>
+<p>&nbsp;</p>
+<p>Clients can present links to users in whatever fashion the client author wishes, however clients MUST NOT automatically make any network connections as part of displaying links whose scheme corresponds to a network protocol (e.g. links beginning with gemini:<em>, gopher:</em>, https:<em>, ftp:</em> , etc.).</p>
+<p>&nbsp;</p>
+<h3>5.4.3 Preformatting toggle lines</h3>
+<p>&nbsp;</p>
+<p>Any line whose first three characters are &quot;```&quot; (i.e. three consecutive back ticks with no leading whitespace) are preformatted toggle lines. These lines should NOT be included in the rendered output shown to the user. Instead, these lines toggle the parser between preformatted mode being &quot;on&quot; or &quot;off&quot;. Preformatted mode should be &quot;off&quot; at the beginning of a document. The current status of preformatted mode is the only internal state a parser is required to maintain. When preformatted mode is &quot;on&quot;, the usual rules for identifying line types are suspended, and all lines should be identified as preformatted text lines (see 5.4.4).</p>
+<p>&nbsp;</p>
+<p>Preformatting toggle lines can be thought of as analogous to &lt;pre&gt; and &lt;/pre&gt; tags in HTML.</p>
+<p>&nbsp;</p>
+<p>Any text following the leading &quot;```&quot; of a preformat toggle line which toggles preformatted mode on MAY be interpreted by the client as &quot;alt text&quot; pertaining to the preformatted text lines which follow the toggle line. Use of alt text is at the client&apos;s discretion, and simple clients may ignore it. Alt text is recommended for ASCII art or similar non-textual content which, for example, cannot be meaningfully understood when rendered through a screen reader or usefully indexed by a search engine. Alt text may also be used for computer source code to identify the programming language which advanced clients may use for syntax highlighting.</p>
+<p>&nbsp;</p>
+<p>Any text following the leading &quot;```&quot; of a preformat toggle line which toggles preformatted mode off MUST be ignored by clients.</p>
+<p>&nbsp;</p>
+<h3>5.4.4 Preformatted text lines</h3>
+<p>&nbsp;</p>
+<p>Preformatted text lines should be presented to the user in a &quot;neutral&quot;, monowidth font without any alteration to whitespace or stylistic enhancements. Graphical clients should use scrolling mechanisms to present preformatted text lines which are longer than the client viewport, in preference to wrapping. In displaying preformatted text lines, clients should keep in mind applications like ASCII art and computer source code: in particular, source code in languages with significant whitespace (e.g. Python) should be able to be copied and pasted from the client into a file and interpreted/compiled without any problems arising from the client&apos;s manner of displaying them.</p>
+<p>&nbsp;</p>
+<h2>5.5 Advanced line types</h2>
+<p>&nbsp;</p>
+<p>The following advanced line types MAY be recognised by advanced clients. Simple clients may treat them all as text lines as per 5.4.1 without any loss of essential function.</p>
+<p>&nbsp;</p>
+<h3>5.5.1 Heading lines</h3>
+<p>&nbsp;</p>
+<p>Lines beginning with &quot;#&quot; are heading lines. Heading lines consist of one, two or three consecutive &quot;#&quot; characters, followed by optional whitespace, followed by heading text. The number of # characters indicates the &quot;level&quot; of header; #, ## and ### can be thought of as analogous to &lt;h1&gt;, &lt;h2&gt; and &lt;h3&gt; in HTML.</p>
+<p>&nbsp;</p>
+<p>Heading text should be presented to the user, and clients MAY use special formatting, e.g. a larger or bold font, to indicate its status as a header (simple clients may simply print the line, including its leading #s, without any styling at all). However, the main motivation for the definition of heading lines is not stylistic but to provide a machine-readable representation of the internal structure of the document. Advanced clients can use this information to, e.g. display an automatically generated and hierarchically formatted &quot;table of contents&quot; for a long document in a side-pane, allowing users to easily jump to specific sections without excessive scrolling. CMS-style tools automatically generating menus or Atom/RSS feeds for a directory of text/gemini files can use first</p>
+<p>heading in the file as a human-friendly title.</p>
+<p>&nbsp;</p>
+<h3>5.5.2 Unordered list items</h3>
+<p>&nbsp;</p>
+<p>Lines beginning with &quot;* &quot; are unordered list items. This line type exists purely for stylistic reasons. The * may be replaced in advanced clients by a bullet symbol. Any text after the &quot;* &quot; should be presented to the user as if it were a text line, i.e. wrapped to fit the viewport and formatted &quot;nicely&quot;. Advanced clients can take the space of the bullet symbol into account when wrapping long list items to ensure that all lines of text corresponding to the item are offset an equal distance from the left of the screen.</p>
+<p>&nbsp;</p>
+<h3>5.5.3 Quote lines</h3>
+<p>&nbsp;</p>
+<p>Lines beginning with &quot;&gt;&quot; are quote lines. This line type exists so that advanced clients may use distinct styling to convey to readers the important semantic information that certain text is being quoted from an external source. For example, when wrapping long lines to the the viewport, each resultant line may have a &quot;&gt;&quot; symbol placed at the front.</p>
+<p>&nbsp;</p>
+<h1>Appendix 1. Full two digit status codes</h1>
+<p>&nbsp;</p>
+<h2>10 INPUT</h2>
+<p>&nbsp;</p>
+<p>As per definition of single-digit code 1 in 3.2.</p>
+<p>&nbsp;</p>
+<h2>11 SENSITIVE INPUT</h2>
+<p>&nbsp;</p>
+<p>As per status code 10, but for use with sensitive input such as passwords. Clients should present the prompt as per status code 10, but the user&apos;s input should not be echoed to the screen to prevent it being read by &quot;shoulder surfers&quot;.</p>
+<p>&nbsp;</p>
+<h2>20 SUCCESS</h2>
+<p>&nbsp;</p>
+<p>As per definition of single-digit code 2 in 3.2.</p>
+<p>&nbsp;</p>
+<h2>30 REDIRECT - TEMPORARY</h2>
+<p>&nbsp;</p>
+<p>As per definition of single-digit code 3 in 3.2.</p>
+<p>&nbsp;</p>
+<h2>31 REDIRECT - PERMANENT</h2>
+<p>&nbsp;</p>
+<p>The requested resource should be consistently requested from the new URL provided in future. Tools like search engine indexers or content aggregators should update their configurations to avoid requesting the old URL, and end-user clients may automatically update bookmarks, etc. Note that clients which only pay attention to the initial digit of status codes will treat this as a temporary redirect. They will still end up at the right place, they just won&apos;t be able to make use of the knowledge that this redirect is permanent, so they&apos;ll pay a small performance penalty by having to follow the redirect each time.</p>
+<p>&nbsp;</p>
+<h2>40 TEMPORARY FAILURE</h2>
+<p>&nbsp;</p>
+<p>As per definition of single-digit code 4 in 3.2.</p>
+<p>&nbsp;</p>
+<h2>41 SERVER UNAVAILABLE</h2>
+<p>&nbsp;</p>
+<p>The server is unavailable due to overload or maintenance. (cf HTTP 503)</p>
+<p>&nbsp;</p>
+<h2>42 CGI ERROR</h2>
+<p>&nbsp;</p>
+<p>A CGI process, or similar system for generating dynamic content, died unexpectedly or timed out.</p>
+<p>&nbsp;</p>
+<h2>43 PROXY ERROR</h2>
+<p>&nbsp;</p>
+<p>A proxy request failed because the server was unable to successfully complete a transaction with the remote host. (cf HTTP 502, 504)</p>
+<p>&nbsp;</p>
+<h2>44 SLOW DOWN</h2>
+<p>&nbsp;</p>
+<p>Rate limiting is in effect. &lt;META&gt; is an integer number of seconds which the client must wait before another request is made to this server. (cf HTTP 429)</p>
+<p>&nbsp;</p>
+<h2>50 PERMANENT FAILURE</h2>
+<p>&nbsp;</p>
+<p>As per definition of single-digit code 5 in 3.2.</p>
+<p>&nbsp;</p>
+<h2>51 NOT FOUND</h2>
+<p>&nbsp;</p>
+<p>The requested resource could not be found but may be available in the future. (cf HTTP 404) (struggling to remember this important status code? Easy: you can&apos;t find things hidden at Area 51!)</p>
+<p>&nbsp;</p>
+<h2>52 GONE</h2>
+<p>&nbsp;</p>
+<p>The resource requested is no longer available and will not be available again. Search engines and similar tools should remove this resource from their indices. Content aggregators should stop requesting the resource and convey to their human users that the subscribed resource is gone. (cf HTTP 410)</p>
+<p>&nbsp;</p>
+<h2>53 PROXY REQUEST REFUSED</h2>
+<p>&nbsp;</p>
+<p>The request was for a resource at a domain not served by the server and the server does not accept proxy requests.</p>
+<p>&nbsp;</p>
+<h2>59 BAD REQUEST</h2>
+<p>&nbsp;</p>
+<p>The server was unable to parse the client&apos;s request, presumably due to a malformed request. (cf HTTP 400)</p>
+<p>&nbsp;</p>
+<h2>60 CLIENT CERTIFICATE REQUIRED</h2>
+<p>&nbsp;</p>
+<p>As per definition of single-digit code 6 in 3.2.</p>
+<p>&nbsp;</p>
+<h2>61 CERTIFICATE NOT AUTHORISED</h2>
+<p>&nbsp;</p>
+<p>The supplied client certificate is not authorised for accessing the particular requested resource. The problem is not with the certificate itself, which may be authorised for other resources.</p>
+<p>&nbsp;</p>
+<h2>62 CERTIFICATE NOT VALID</h2>
+<p>&nbsp;</p>
+<p>The supplied client certificate was not accepted because it is not valid. This indicates a problem with the certificate in and of itself, with no consideration of the particular requested resource. The most likely cause is that the certificate&apos;s validity start date is in the future or its expiry date has passed, but this code may also indicate an invalid signature, or a violation of a X509 standard requirements. The &lt;META&gt; should provide more information about the exact error.</p>
--- a/tests/files_with_html/text.gmi
+++ b/tests/files_with_html/text.gmi
@ -0,0 +1,353 @@
+NOTE: this page was downloaded on 2021-03-04 from the original capsule
+=> gemini://gemini.circumlunar.space/docs/specification.gmi Gemini
+=> https://gemini.circumlunar.space/docs/specification.gmi Web
+=> specification_Gemini_v0.14.3.txt Copie locale texte
+
+# Project Gemini
+
+## Speculative specification
+
+v0.14.3, November 29th 2020
+
+This is an increasingly less rough sketch of an actual spec for Project Gemini.  Although not finalised yet, further changes to the specification are likely to be relatively small.  You can write code to this pseudo-specification and be confident that it probably won't become totally non-functional due to massive changes next week, but you are still urged to keep an eye on ongoing development of the protocol and make changes as required.
+
+This is provided mostly so that people can quickly get up to speed on what I'm thinking without having to read lots and lots of old phlog posts and keep notes.
+
+Feedback on any part of this is extremely welcome, please email solderpunk@posteo.net.
+
+# 1 Overview
+
+Gemini is a client-server protocol featuring request-response transactions, broadly similar to gopher or HTTP.  Connections are closed at the end of a single transaction and cannot be reused.  When Gemini is served over TCP/IP, servers should listen on port 1965 (the first manned Gemini mission, Gemini 3, flew in March '65).  This is an unprivileged port, so it's very easy to run a server as a "nobody" user, even if e.g. the server is written in Go and so can't drop privileges in the traditional fashion.
+
+## 1.1 Gemini transactions
+
+There is one kind of Gemini transaction, roughly equivalent to a gopher request or a HTTP "GET" request.  Transactions happen as follows:
+
+C:   Opens connection
+S:   Accepts connection
+C/S: Complete TLS handshake (see section 4)
+C:   Validates server certificate (see 4.2)
+C:   Sends request (one CRLF terminated line) (see section 2)
+S:   Sends response header (one CRLF terminated line), closes connection
+     under non-success conditions (see 3.1 and 3.2)
+S:   Sends response body (text or binary data) (see 3.3)
+S:   Closes connection
+C:   Handles response (see 3.4)
+
+## 1.2 Gemini URI scheme
+
+Resources hosted via Gemini are identified using URIs with the scheme "gemini".  This scheme is syntactically compatible with the generic URI syntax defined in RFC 3986, but does not support all components of the generic syntax.  In particular, the authority component is allowed and required, but its userinfo subcomponent is NOT allowed.  The host subcomponent is required.  The port subcomponent is optional, with a default value of 1965.  The path, query and fragment components are allowed and have no special meanings beyond those defined by the generic syntax.  Spaces in gemini URIs should be encoded as %20, not +.
+
+# 2 Gemini requests
+
+Gemini requests are a single CRLF-terminated line with the following structure:
+
+<URL><CR><LF>
+
+<URL> is a UTF-8 encoded absolute URL, including a scheme, of maximum length 1024 bytes.
+
+Sending an absolute URL instead of only a path or selector is effectively equivalent to building in a HTTP "Host" header.  It permits virtual hosting of multiple Gemini domains on the same IP address.  It also allows servers to optionally act as proxies.  Including schemes other than "gemini" in requests allows servers to optionally act as protocol-translating gateways to e.g. fetch gopher resources over Gemini.  Proxying is optional and the vast majority of servers are expected to only respond to requests for resources at their own domain(s).
+
+# 3 Gemini responses
+
+Gemini response consist of a single CRLF-terminated header line, optionally followed by a response body.
+
+## 3.1 Response headers
+
+Gemini response headers look like this:
+
+<STATUS><SPACE><META><CR><LF>
+
+<STATUS> is a two-digit numeric status code, as described below in 3.2 and in Appendix 1.
+
+<SPACE> is a single space character, i.e. the byte 0x20.
+
+<META> is a UTF-8 encoded string of maximum length 1024 bytes, whose meaning is <STATUS> dependent.
+
+<STATUS> and <META> are separated by a single space character.
+
+If <STATUS> does not belong to the "SUCCESS" range of codes, then the server MUST close the connection after sending the header and MUST NOT send a response body.
+
+If a server sends a <STATUS> which is not a two-digit number or a <META> which exceeds 1024 bytes in length, the client SHOULD close the connection and disregard the response header, informing the user of an error.
+
+## 3.2 Status codes
+
+Gemini uses two-digit numeric status codes.  Related status codes share the same first digit.  Importantly, the first digit of Gemini status codes do not group codes into vague categories like "client error" and "server error" as per HTTP.  Instead, the first digit alone provides enough information for a client to determine how to handle the response.  By design, it is possible to write a simple but feature complete client which only looks at the first digit.  The second digit provides more fine-grained information, for unambiguous server logging, to allow writing comfier interactive clients which provide a slightly more streamlined user interface, and to allow writing more robust and intelligent automated clients like content aggregators, search engine crawlers, etc.
+
+The first digit of a response code unambiguously places the response into one of six categories, which define the semantics of the <META> line.
+
+### 3.2.1 1x (INPUT)
+
+Status codes beginning with 1 are INPUT status codes, meaning:
+
+The requested resource accepts a line of textual user input.  The <META> line is a prompt which should be displayed to the user.  The same resource should then be requested again with the user's input included as a query component.  Queries are included in requests as per the usual generic URL definition in RFC3986, i.e. separated from the path by a ?.  Reserved characters used in the user's input must be "percent-encoded" as per RFC3986, and space characters should also be percent-encoded.
+
+### 3.2.2 2x (SUCCESS)
+
+Status codes beginning with 2 are SUCCESS status codes, meaning:
+
+The request was handled successfully and a response body will follow the response header.  The <META> line is a MIME media type which applies to the response body.
+
+### 3.2.3 3x (REDIRECT)
+
+Status codes beginning with 3 are REDIRECT status codes, meaning:
+
+The server is redirecting the client to a new location for the requested resource.  There is no response body.  <META> is a new URL for the requested resource.  The URL may be absolute or relative.  The redirect should be considered temporary, i.e. clients should continue to request the resource at the original address and should not performance convenience actions like automatically updating bookmarks.  There is no response body.
+
+### 3.2.4 4x (TEMPORARY FAILURE)
+
+Status codes beginning with 4 are TEMPORARY FAILURE status codes, meaning:
+
+The request has failed.  There is no response body.  The nature of the failure is temporary, i.e. an identical request MAY succeed in the future.  The contents of <META> may provide additional information on the failure, and should be displayed to human users.
+
+### 3.2.5 5x (PERMANENT FAILURE)
+
+Status codes beginning with 5 are PERMANENT FAILURE status codes, meaning:
+
+The request has failed.  There is no response body.  The nature of the failure is permanent, i.e. identical future requests will reliably fail for the same reason.  The contents of <META> may provide additional information on the failure, and should be displayed to human users.  Automatic clients such as aggregators or indexing crawlers should not repeat this request.
+
+### 3.2.6 6x (CLIENT CERTIFICATE REQUIRED)
+
+Status codes beginning with 6 are CLIENT CERTIFICATE REQUIRED status codes, meaning:
+
+The requested resource requires a client certificate to access.  If the request was made without a certificate, it should be repeated with one.  If the request was made with a certificate, the server did not accept it and the request should be repeated with a different certificate.  The contents of <META> (and/or the specific 6x code) may provide additional information on certificate requirements or the reason a certificate was rejected.
+
+### 3.2.7 Notes
+
+Note that for basic interactive clients for human use, errors 4 and 5 may be effectively handled identically, by simply displaying the contents of <META> under a heading of "ERROR".  The temporary/permanent error distinction is primarily relevant to well-behaving automated clients.  Basic clients may also choose not to support client-certificate authentication, in which case only four distinct status handling routines are required (for statuses beginning with 1, 2, 3 or a combined 4-or-5).
+
+The full two-digit system is detailed in Appendix 1.  Note that for each of the six valid first digits, a code with a second digit of zero corresponds is a generic status of that kind with no special semantics.  This means that basic servers without any advanced functionality need only be able to return codes of 10, 20, 30, 40 or 50.
+
+The Gemini status code system has been carefully designed so that the increased power (and correspondingly increased complexity) of the second digits is entirely "opt-in" on the part of both servers and clients.
+
+## 3.3 Response bodies
+
+Response bodies are just raw content, text or binary, ala gopher.  There is no support for compression, chunking or any other kind of content or transfer encoding.  The server closes the connection after the final byte, there is no "end of response" signal like gopher's lonely dot.
+
+Response bodies only accompany responses whose header indicates a SUCCESS status (i.e. a status code whose first digit is 2).  For such responses, <META> is a MIME media type as defined in RFC 2046.
+
+Internet media types are registered with a canonical form.  Content transferred via Gemini MUST be represented in the appropriate canonical form prior to its transmission except for "text" types, as defined in the next paragraph.
+
+When in canonical form, media subtypes of the "text" type use CRLF as the text line break.  Gemini relaxes this requirement and allows the transport of text media with plain LF alone (but NOT a plain CR alone) representing a line break when it is done consistently for an entire response body.  Gemini clients MUST accept CRLF and bare LF as being representative of a line break in text media received via Gemini.
+
+If a MIME type begins with "text/" and no charset is explicitly given, the charset should be assumed to be UTF-8.  Compliant clients MUST support UTF-8-encoded text/* responses.  Clients MAY optionally support other encodings.  Clients receiving a response in a charset they cannot decode SHOULD gracefully inform the user what happened instead of displaying garbage.
+
+If <META> is an empty string, the MIME type MUST default to "text/gemini; charset=utf-8".  The text/gemini media type is defined in section 5.
+
+## 3.4 Response body handling
+
+Response handling by clients should be informed by the provided MIME type information.  Gemini defines one MIME type of its own (text/gemini) whose handling is discussed below in section 5.  In all other cases, clients should do "something sensible" based on the MIME type.  Minimalistic clients might adopt a strategy of printing all other text/* responses to the screen without formatting and saving all non-text responses to the disk.  Clients for unix systems may consult /etc/mailcap to find installed programs for handling non-text types.
+
+# 4 TLS
+
+Use of TLS for Gemini transactions is mandatory.
+
+Use of the Server Name Indication (SNI) extension to TLS is also mandatory, to facilitate name-based virtual hosting.
+
+## 4.1 Version requirements
+
+Servers MUST use TLS version 1.2 or higher and SHOULD use TLS version 1.3 or higher.  TLS 1.2 is reluctantly permitted for now to avoid drastically reducing the range of available implementation libraries.  Hopefully TLS 1.3 or higher can be specced in the near future.  Clients who wish to be "ahead of the curve MAY refuse to connect to servers using TLS version 1.2 or lower.
+
+## 4.2 Server certificate validation
+
+Clients can validate TLS connections however they like (including not at all) but the strongly RECOMMENDED approach is to implement a lightweight "TOFU" certificate-pinning system which treats self-signed certificates as first- class citizens.  This greatly reduces TLS overhead on the network (only one cert needs to be sent, not a whole chain) and lowers the barrier to entry for setting up a Gemini site (no need to pay a CA or setup a Let's Encrypt cron job, just make a cert and go).
+
+TOFU stands for "Trust On First Use" and is public-key security model similar to that used by OpenSSH.  The first time a Gemini client connects to a server, it accepts whatever certificate it is presented.  That certificate's fingerprint and expiry date are saved in a persistent database (like the .known_hosts file for SSH), associated with the server's hostname.  On all subsequent connections to that hostname, the received certificate's fingerprint is computed and compared to the one in the database.  If the certificate is not the one previously received, but the previous certificate's expiry date has not passed, the user is shown a warning, analogous to the one web browser users are shown when receiving a certificate without a signature chain leading to a trusted CA.
+
+This model is by no means perfect, but it is not awful and is vastly superior to just accepting self-signed certificates unconditionally.
+
+## 4.3 Client certificates
+
+Although rarely seen on the web, TLS permits clients to identify themselves to servers using certificates, in exactly the same way that servers traditionally identify themselves to the client.  Gemini includes the ability for servers to request in-band that a client repeats a request with a client certificate.  This is a very flexible, highly secure but also very simple notion of client identity with several applications:
+
+* Short-lived client certificates which are generated on demand and deleted immediately after use can be used as "session identifiers" to maintain server-side state for applications.  In this role, client certificates act as a substitute for HTTP cookies, but unlike cookies they are generated voluntarily by the client, and once the client deletes a certificate and its matching key, the server cannot possibly "resurrect" the same value later (unlike so-called "super cookies").
+* Long-lived client certificates can reliably identify a user to a multi-user application without the need for passwords which may be brute-forced.  Even a stolen database table mapping certificate hashes to user identities is not a security risk, as rainbow tables for certificates are not feasible.
+* Self-hosted, single-user applications can be easily and reliably secured in a manner familiar from OpenSSH: the user generates a self-signed certificate and adds its hash to a server-side list of permitted certificates, analogous to the .authorized_keys file for SSH).
+
+Gemini requests will typically be made without a client certificate.  If a requested resource requires a client certificate and one is not included in a request, the server can respond with a status code of 60, 61 or 62 (see Appendix 1 below for a description of all status codes related to client certificates).  A client certificate which is generated or loaded in response to such a status code has its scope bound to the same hostname as the request URL and to all paths below the path of the request URL path.  E.g. if a request for gemini://example.com/foo returns status 60 and the user chooses to generate a new client certificate in response to this, that same certificate should be used for subsequent requests to gemini://example.com/foo, gemini://example.com/foo/bar/, gemini://example.com/foo/bar/baz, etc., until such time as the user decides to delete the certificate or to temporarily deactivate it.  Interactive clients for human users are strongly recommended to make such actions easy and to generally give users full control over the use of client certificates.
+
+# 5 The text/gemini media type
+
+## 5.1 Overview
+
+In the same sense that HTML is the "native" response format of HTTP and plain text is the native response format of gopher, Gemini defines its own native response format - though of course, thanks to the inclusion of a MIME type in the response header Gemini can be used to serve plain text, rich text, HTML, Markdown, LaTeX, etc.
+
+Response bodies of type "text/gemini" are a kind of lightweight hypertext format, which takes inspiration from gophermaps and from Markdown.  The format permits richer typographic possibilities than the plain text of Gopher, but remains extremely easy to parse.  The format is line-oriented, and a satisfactory rendering can be achieved with a single pass of a document, processing each line independently.  As per gopher, links can only be displayed one per line, encouraging neat, list-like structure.
+
+Similar to how the two-digit Gemini status codes were designed so that simple clients can function correctly while ignoring the second digit, the text/gemini format has been designed so that simple clients can ignore the more advanced features and still remain very usable.
+
+## 5.2 Parameters
+
+As a subtype of the top-level media type "text", "text/gemini" inherits the "charset" parameter defined in RFC 2046.  However, as noted in 3.3, the default value of "charset" is "UTF-8" for "text" content transferred via Gemini.
+
+A single additional parameter specific to the "text/gemini" subtype is defined: the "lang" parameter.  The value of "lang" denotes the natural language or language(s) in which the textual content of a "text/gemini" document is written.  The presence of the "lang" parameter is optional.  When the "lang" parameter is present, its interpretation is defined entirely by the client.  For example, clients which use text-to-speech technology to make Gemini content accessible to visually impaired users may use the value of "lang" to improve pronunciation of content.  Clients which render text to a screen may use the value of "lang" to determine whether text should be displayed left-to-right or right-to-left.  Simple clients for users who only read languages written left-to-right may simply ignore the value of "lang".  When the "lang" parameter is not present, no default value should be assumed and clients which require some notion of a language in order to process the content (such as text-to-speech screen readers) should rely on user-input to determine how to proceed in the absence of a "lang" parameter.
+
+Valid values for the "lang" parameter are comma-separated lists of one or more language tags as defined in RFC4646.  For example:
+
+* "text/gemini; lang=en" Denotes a text/gemini document written in English
+* "text/gemini; lang=fr" Denotes a text/gemini document written in French
+* "text/gemini; lang=en,fr" Denotes a text/gemini document written in a mixture of English and French
+* "text/gemini; lang=de-CH" Denotes a text/gemini document written in Swiss German
+* "text/gemini; lang=sr-Cyrl" Denotes a text/gemini document written in Serbian using the Cyrllic script
+* "text/gemini; lang=zh-Hans-CN" Denotes a text/gemini document written in Chinese using the Simplified script as used in mainland China
+
+## 5.3 Line-orientation
+
+As mentioned, the text/gemini format is line-oriented.  Each line of a text/gemini document has a single "line type".  It is possible to unambiguously determine a line's type purely by inspecting its first three characters.  A line's type determines the manner in which it should be presented to the user.  Any details of presentation or rendering associated with a particular line type are strictly limited in scope to that individual line.
+
+There are 7 different line types in total.  However, a fully functional and specification compliant Gemini client need only recognise and handle 4 of them - these are the "core line types", (see 5.4).  Advanced clients can also handle the additional "advanced line types" (see 5.5).  Simple clients can treat all advanced line types as equivalent to one of the core line types and still offer an adequate user experience.
+
+## 5.4 Core line types
+
+The four core line types are:
+
+### 5.4.1 Text lines
+
+Text lines are the most fundamental line type - any line which does not match the definition of another line type defined below defaults to being a text line.  The majority of lines in a typical text/gemini document will be text lines.
+
+Text lines should be presented to the user, after being wrapped to the appropriate width for the client's viewport (see below).  Text lines may be presented to the user in a visually pleasing manner for general reading, the precise meaning of which is at the client's discretion.  For example, variable width fonts may be used, spacing may be normalised, with spaces between sentences being made wider than spacing between words, and other such typographical niceties may be applied.  Clients may permit users to customise the appearance of text lines by altering the font, font size, text and background colour, etc.  Authors should not expect to exercise any control over the precise rendering of their text lines, only of their actual textual content.  Content such as ASCII art, computer source code, etc. which may appear incorrectly when treated as such should be enclosed between preformatting toggle lines (see 5.4.3).
+
+Blank lines are instances of text lines and have no special meaning.  They should be rendered individually as vertical blank space each time they occur.  In this way  they are analogous to <br/> tags in HTML.  Consecutive blank lines should NOT be collapsed into a fewer blank lines.  Note also that consecutive non-blank text lines do not form any kind of coherent unit or block such as a "paragraph": all text lines are independent entities.
+
+Text lines which are longer than can fit on a client's display device SHOULD be "wrapped" to fit, i.e. long lines should be split (ideally at whitespace or at hyphens) into multiple consecutive lines of a device-appropriate width.  This wrapping is applied to each line of text independently.  Multiple consecutive lines which are shorter than the client's display device MUST NOT be combined into fewer, longer lines.
+
+In order to take full advantage of this method of text formatting, authors of text/gemini content SHOULD avoid hard-wrapping to a specific fixed width, in contrast to the convention in Gopherspace where text is typically wrapped at 80 characters or fewer.  Instead, text which should be displayed as a contiguous block should be written as a single long line.  Most text editors can be configured to "soft-wrap", i.e. to write this kind of file while displaying the long lines wrapped at word boundaries to fit the author's display device.
+
+Authors who insist on hard-wrapping their content MUST be aware that the content will display neatly on clients whose display device is as wide as the hard-wrapped length or wider, but will appear with irregular line widths on narrower clients.
+
+### 5.4.2 Link lines
+
+Lines beginning with the two characters "=>" are link lines, which have the following syntax:
+
+```
+=>[<whitespace>]<URL>[<whitespace><USER-FRIENDLY LINK NAME>]
+```
+
+where:
+
+* <whitespace> is any non-zero number of consecutive spaces or tabs
+* Square brackets indicate that the enclosed content is optional.
+* <URL> is a URL, which may be absolute or relative.
+
+All the following examples are valid link lines:
+
+```
+=> gemini://example.org/
+=> gemini://example.org/ An example link
+=> gemini://example.org/foo	Another example link at the same host
+=> foo/bar/baz.txt	A relative link
+=> 	gopher://example.org:70/1 A gopher link
+```
+
+URLs in link lines must have reserved characters and spaces percent-encoded as per RFC 3986.
+
+Note that link URLs may have schemes other than gemini.  This means that Gemini documents can simply and elegantly link to documents hosted via other protocols, unlike gophermaps which can only link to non-gopher content via a non-standard adaptation of the `h` item-type.
+
+Clients can present links to users in whatever fashion the client author wishes, however clients MUST NOT automatically make any network connections as part of displaying links whose scheme corresponds to a network protocol (e.g. links beginning with gemini://, gopher://, https://, ftp:// , etc.).
+
+### 5.4.3 Preformatting toggle lines
+
+Any line whose first three characters are "```" (i.e. three consecutive back ticks with no leading whitespace) are preformatted toggle lines.  These lines should NOT be included in the rendered output shown to the user.  Instead, these lines toggle the parser between preformatted mode being "on" or "off".  Preformatted mode should be "off" at the beginning of a document.  The current status of preformatted mode is the only internal state a parser is required to maintain.  When preformatted mode is "on", the usual rules for identifying line types are suspended, and all lines should be identified as preformatted text lines (see 5.4.4).
+
+Preformatting toggle lines can be thought of as analogous to <pre> and </pre> tags in HTML.
+
+Any text following the leading "```" of a preformat toggle line which toggles preformatted mode on MAY be interpreted by the client as "alt text" pertaining to the preformatted text lines which follow the toggle line.  Use of alt text is at the client's discretion, and simple clients may ignore it.  Alt text is recommended for ASCII art or similar non-textual content which, for example, cannot be meaningfully understood when rendered through a screen reader or usefully indexed by a search engine.  Alt text may also be used for computer source code to identify the programming language which advanced clients may use for syntax highlighting.
+
+Any text following the leading "```" of a preformat toggle line which toggles preformatted mode off MUST be ignored by clients.
+
+### 5.4.4 Preformatted text lines
+
+Preformatted text lines should be presented to the user in a "neutral", monowidth font without any alteration to whitespace or stylistic enhancements.  Graphical clients should use scrolling mechanisms to present preformatted text lines which are longer than the client viewport, in preference to wrapping.  In displaying preformatted text lines, clients should keep in mind applications like ASCII art and computer source code: in particular, source code in languages with significant whitespace (e.g. Python) should be able to be copied and pasted from the client into a file and interpreted/compiled without any problems arising from the client's manner of displaying them.
+
+## 5.5 Advanced line types
+
+The following advanced line types MAY be recognised by advanced clients.  Simple clients may treat them all as text lines as per 5.4.1 without any loss of essential function.
+
+### 5.5.1 Heading lines
+
+Lines beginning with "#" are heading lines.  Heading lines consist of one, two or three consecutive "#" characters, followed by optional whitespace, followed by heading text.  The number of # characters indicates the "level" of header;  #, ## and ### can be thought of as analogous to <h1>, <h2> and <h3> in HTML.
+
+Heading text should be presented to the user, and clients MAY use special formatting, e.g. a larger or bold font, to indicate its status as a header (simple clients may simply print the line, including its leading #s, without any styling at all).  However, the main motivation for the definition of heading lines is not stylistic but to provide a machine-readable representation of the internal structure of the document.  Advanced clients can use this information to, e.g. display an automatically generated and hierarchically formatted "table of contents" for a long document in a side-pane, allowing users to easily jump to specific sections without excessive scrolling.  CMS-style tools automatically generating menus or Atom/RSS feeds for a directory of text/gemini files can use first
+heading in the file as a human-friendly title.
+
+### 5.5.2 Unordered list items
+
+Lines beginning with "* " are unordered list items.  This line type exists purely for stylistic reasons.  The * may be replaced in advanced clients by a bullet symbol.  Any text after the "* " should be presented to the user as if it were a text line, i.e.  wrapped to fit the viewport and formatted "nicely".  Advanced clients can take the space of the bullet symbol into account when wrapping long list items to ensure that all lines of text corresponding to the item are offset an equal distance from the left of the screen.
+
+### 5.5.3 Quote lines
+
+Lines beginning with ">" are quote lines.  This line type exists so that advanced clients may use distinct styling to convey to readers the important semantic information that certain text is being quoted from an external source.  For example, when wrapping long lines to the the viewport, each resultant line may have a ">" symbol placed at the front.
+
+# Appendix 1. Full two digit status codes
+
+## 10 INPUT
+
+As per definition of single-digit code 1 in 3.2.
+
+## 11 SENSITIVE INPUT
+
+As per status code 10, but for use with sensitive input such as passwords.  Clients should present the prompt as per status code 10, but the user's input should not be echoed to the screen to prevent it being read by "shoulder surfers".
+
+## 20 SUCCESS
+
+As per definition of single-digit code 2 in 3.2.
+
+## 30 REDIRECT - TEMPORARY
+
+As per definition of single-digit code 3 in 3.2.
+
+## 31 REDIRECT - PERMANENT
+
+The requested resource should be consistently requested from the new URL provided in future.  Tools like search engine indexers or content aggregators should update their configurations to avoid requesting the old URL, and end-user clients may automatically update bookmarks, etc.  Note that clients which only pay attention to the initial digit of status codes will treat this as a temporary redirect.  They will still end up at the right place, they just won't be able to make use of the knowledge that this redirect is permanent, so they'll pay a small performance penalty by having to follow the redirect each time.
+
+## 40 TEMPORARY FAILURE
+
+As per definition of single-digit code 4 in 3.2.
+
+## 41 SERVER UNAVAILABLE
+
+The server is unavailable due to overload or maintenance.  (cf HTTP 503)
+
+## 42 CGI ERROR
+
+A CGI process, or similar system for generating dynamic content, died unexpectedly or timed out.
+
+## 43 PROXY ERROR
+
+A proxy request failed because the server was unable to successfully complete a transaction with the remote host.  (cf HTTP 502, 504)
+
+## 44 SLOW DOWN
+
+Rate limiting is in effect.  <META> is an integer number of seconds which the client must wait before another request is made to this server.  (cf HTTP 429)
+
+## 50 PERMANENT FAILURE
+
+As per definition of single-digit code 5 in 3.2.
+
+## 51 NOT FOUND
+
+The requested resource could not be found but may be available in the future.  (cf HTTP 404) (struggling to remember this important status code?  Easy: you can't find things hidden at Area 51!)
+
+## 52 GONE
+
+The resource requested is no longer available and will not be available again.  Search engines and similar tools should remove this resource from their indices.  Content aggregators should stop requesting the resource and convey to their human users that the subscribed resource is gone.  (cf HTTP 410)
+
+## 53 PROXY REQUEST REFUSED
+
+The request was for a resource at a domain not served by the server and the server does not accept proxy requests.
+
+## 59 BAD REQUEST
+
+The server was unable to parse the client's request, presumably due to a malformed request.  (cf HTTP 400)
+
+## 60 CLIENT CERTIFICATE REQUIRED
+
+As per definition of single-digit code 6 in 3.2.
+
+## 61 CERTIFICATE NOT AUTHORISED
+
+The supplied client certificate is not authorised for accessing the particular requested resource.  The problem is not with the certificate itself, which may be authorised for other resources.
+
+## 62 CERTIFICATE NOT VALID
+
+The supplied client certificate was not accepted because it is not valid.  This indicates a problem with the certificate in and of itself, with no consideration of the particular requested resource.  The most likely cause is that the certificate's validity start date is in the future or its expiry date has passed, but this code may also indicate an invalid signature, or a violation of a X509 standard requirements.  The <META> should provide more information about the exact error.
--- a/tests/files_with_html/text.gmi.html
+++ b/tests/files_with_html/text.gmi.html
@ -0,0 +1,359 @@
+<p>NOTE: this page was downloaded on 2021-03-04 from the original capsule</p>
+<p><a target='_blank' class='gemini' href='gemini://gemini.circumlunar.space/docs/specification.gmi'>Gemini</a></p>
+<p><a target='_blank' class='https' href='https://gemini.circumlunar.space/docs/specification.gmi'>Web</a></p>
+<p><a class='local' href='specification_Gemini_v0.14.3.txt'>Copie locale texte</a></p>
+<p>&nbsp;</p>
+<h1>Project Gemini</h1>
+<p>&nbsp;</p>
+<h2>Speculative specification</h2>
+<p>&nbsp;</p>
+<p>v0.14.3, November 29th 2020</p>
+<p>&nbsp;</p>
+<p>This is an increasingly less rough sketch of an actual spec for Project Gemini. Although not finalised yet, further changes to the specification are likely to be relatively small. You can write code to this pseudo-specification and be confident that it probably won&apos;t become totally non-functional due to massive changes next week, but you are still urged to keep an eye on ongoing development of the protocol and make changes as required.</p>
+<p>&nbsp;</p>
+<p>This is provided mostly so that people can quickly get up to speed on what I&apos;m thinking without having to read lots and lots of old phlog posts and keep notes.</p>
+<p>&nbsp;</p>
+<p>Feedback on any part of this is extremely welcome, please email solderpunk@posteo.net.</p>
+<p>&nbsp;</p>
+<h1>1 Overview</h1>
+<p>&nbsp;</p>
+<p>Gemini is a client-server protocol featuring request-response transactions, broadly similar to gopher or HTTP. Connections are closed at the end of a single transaction and cannot be reused. When Gemini is served over TCP/IP, servers should listen on port 1965 (the first manned Gemini mission, Gemini 3, flew in March &apos;65). This is an unprivileged port, so it&apos;s very easy to run a server as a &quot;nobody&quot; user, even if e.g. the server is written in Go and so can&apos;t drop privileges in the traditional fashion.</p>
+<p>&nbsp;</p>
+<h2>1.1 Gemini transactions</h2>
+<p>&nbsp;</p>
+<p>There is one kind of Gemini transaction, roughly equivalent to a gopher request or a HTTP &quot;GET&quot; request. Transactions happen as follows:</p>
+<p>&nbsp;</p>
+<p>C: Opens connection</p>
+<p>S: Accepts connection</p>
+<p>C/S: Complete TLS handshake (see section 4)</p>
+<p>C: Validates server certificate (see 4.2)</p>
+<p>C: Sends request (one CRLF terminated line) (see section 2)</p>
+<p>S: Sends response header (one CRLF terminated line), closes connection</p>
+<p> under non-success conditions (see 3.1 and 3.2)</p>
+<p>S: Sends response body (text or binary data) (see 3.3)</p>
+<p>S: Closes connection</p>
+<p>C: Handles response (see 3.4)</p>
+<p>&nbsp;</p>
+<h2>1.2 Gemini URI scheme</h2>
+<p>&nbsp;</p>
+<p>Resources hosted via Gemini are identified using URIs with the scheme &quot;gemini&quot;. This scheme is syntactically compatible with the generic URI syntax defined in RFC 3986, but does not support all components of the generic syntax. In particular, the authority component is allowed and required, but its userinfo subcomponent is NOT allowed. The host subcomponent is required. The port subcomponent is optional, with a default value of 1965. The path, query and fragment components are allowed and have no special meanings beyond those defined by the generic syntax. Spaces in gemini URIs should be encoded as %20, not +.</p>
+<p>&nbsp;</p>
+<h1>2 Gemini requests</h1>
+<p>&nbsp;</p>
+<p>Gemini requests are a single CRLF-terminated line with the following structure:</p>
+<p>&nbsp;</p>
+<p>&lt;URL&gt;&lt;CR&gt;&lt;LF&gt;</p>
+<p>&nbsp;</p>
+<p>&lt;URL&gt; is a UTF-8 encoded absolute URL, including a scheme, of maximum length 1024 bytes.</p>
+<p>&nbsp;</p>
+<p>Sending an absolute URL instead of only a path or selector is effectively equivalent to building in a HTTP &quot;Host&quot; header. It permits virtual hosting of multiple Gemini domains on the same IP address. It also allows servers to optionally act as proxies. Including schemes other than &quot;gemini&quot; in requests allows servers to optionally act as protocol-translating gateways to e.g. fetch gopher resources over Gemini. Proxying is optional and the vast majority of servers are expected to only respond to requests for resources at their own domain(s).</p>
+<p>&nbsp;</p>
+<h1>3 Gemini responses</h1>
+<p>&nbsp;</p>
+<p>Gemini response consist of a single CRLF-terminated header line, optionally followed by a response body.</p>
+<p>&nbsp;</p>
+<h2>3.1 Response headers</h2>
+<p>&nbsp;</p>
+<p>Gemini response headers look like this:</p>
+<p>&nbsp;</p>
+<p>&lt;STATUS&gt;&lt;SPACE&gt;&lt;META&gt;&lt;CR&gt;&lt;LF&gt;</p>
+<p>&nbsp;</p>
+<p>&lt;STATUS&gt; is a two-digit numeric status code, as described below in 3.2 and in Appendix 1.</p>
+<p>&nbsp;</p>
+<p>&lt;SPACE&gt; is a single space character, i.e. the byte 0x20.</p>
+<p>&nbsp;</p>
+<p>&lt;META&gt; is a UTF-8 encoded string of maximum length 1024 bytes, whose meaning is &lt;STATUS&gt; dependent.</p>
+<p>&nbsp;</p>
+<p>&lt;STATUS&gt; and &lt;META&gt; are separated by a single space character.</p>
+<p>&nbsp;</p>
+<p>If &lt;STATUS&gt; does not belong to the &quot;SUCCESS&quot; range of codes, then the server MUST close the connection after sending the header and MUST NOT send a response body.</p>
+<p>&nbsp;</p>
+<p>If a server sends a &lt;STATUS&gt; which is not a two-digit number or a &lt;META&gt; which exceeds 1024 bytes in length, the client SHOULD close the connection and disregard the response header, informing the user of an error.</p>
+<p>&nbsp;</p>
+<h2>3.2 Status codes</h2>
+<p>&nbsp;</p>
+<p>Gemini uses two-digit numeric status codes. Related status codes share the same first digit. Importantly, the first digit of Gemini status codes do not group codes into vague categories like &quot;client error&quot; and &quot;server error&quot; as per HTTP. Instead, the first digit alone provides enough information for a client to determine how to handle the response. By design, it is possible to write a simple but feature complete client which only looks at the first digit. The second digit provides more fine-grained information, for unambiguous server logging, to allow writing comfier interactive clients which provide a slightly more streamlined user interface, and to allow writing more robust and intelligent automated clients like content aggregators, search engine crawlers, etc.</p>
+<p>&nbsp;</p>
+<p>The first digit of a response code unambiguously places the response into one of six categories, which define the semantics of the &lt;META&gt; line.</p>
+<p>&nbsp;</p>
+<h3>3.2.1 1x (INPUT)</h3>
+<p>&nbsp;</p>
+<p>Status codes beginning with 1 are INPUT status codes, meaning:</p>
+<p>&nbsp;</p>
+<p>The requested resource accepts a line of textual user input. The &lt;META&gt; line is a prompt which should be displayed to the user. The same resource should then be requested again with the user&apos;s input included as a query component. Queries are included in requests as per the usual generic URL definition in RFC3986, i.e. separated from the path by a&#8239;?. Reserved characters used in the user&apos;s input must be &quot;percent-encoded&quot; as per RFC3986, and space characters should also be percent-encoded.</p>
+<p>&nbsp;</p>
+<h3>3.2.2 2x (SUCCESS)</h3>
+<p>&nbsp;</p>
+<p>Status codes beginning with 2 are SUCCESS status codes, meaning:</p>
+<p>&nbsp;</p>
+<p>The request was handled successfully and a response body will follow the response header. The &lt;META&gt; line is a MIME media type which applies to the response body.</p>
+<p>&nbsp;</p>
+<h3>3.2.3 3x (REDIRECT)</h3>
+<p>&nbsp;</p>
+<p>Status codes beginning with 3 are REDIRECT status codes, meaning:</p>
+<p>&nbsp;</p>
+<p>The server is redirecting the client to a new location for the requested resource. There is no response body. &lt;META&gt; is a new URL for the requested resource. The URL may be absolute or relative. The redirect should be considered temporary, i.e. clients should continue to request the resource at the original address and should not performance convenience actions like automatically updating bookmarks. There is no response body.</p>
+<p>&nbsp;</p>
+<h3>3.2.4 4x (TEMPORARY FAILURE)</h3>
+<p>&nbsp;</p>
+<p>Status codes beginning with 4 are TEMPORARY FAILURE status codes, meaning:</p>
+<p>&nbsp;</p>
+<p>The request has failed. There is no response body. The nature of the failure is temporary, i.e. an identical request MAY succeed in the future. The contents of &lt;META&gt; may provide additional information on the failure, and should be displayed to human users.</p>
+<p>&nbsp;</p>
+<h3>3.2.5 5x (PERMANENT FAILURE)</h3>
+<p>&nbsp;</p>
+<p>Status codes beginning with 5 are PERMANENT FAILURE status codes, meaning:</p>
+<p>&nbsp;</p>
+<p>The request has failed. There is no response body. The nature of the failure is permanent, i.e. identical future requests will reliably fail for the same reason. The contents of &lt;META&gt; may provide additional information on the failure, and should be displayed to human users. Automatic clients such as aggregators or indexing crawlers should not repeat this request.</p>
+<p>&nbsp;</p>
+<h3>3.2.6 6x (CLIENT CERTIFICATE REQUIRED)</h3>
+<p>&nbsp;</p>
+<p>Status codes beginning with 6 are CLIENT CERTIFICATE REQUIRED status codes, meaning:</p>
+<p>&nbsp;</p>
+<p>The requested resource requires a client certificate to access. If the request was made without a certificate, it should be repeated with one. If the request was made with a certificate, the server did not accept it and the request should be repeated with a different certificate. The contents of &lt;META&gt; (and/or the specific 6x code) may provide additional information on certificate requirements or the reason a certificate was rejected.</p>
+<p>&nbsp;</p>
+<h3>3.2.7 Notes</h3>
+<p>&nbsp;</p>
+<p>Note that for basic interactive clients for human use, errors 4 and 5 may be effectively handled identically, by simply displaying the contents of &lt;META&gt; under a heading of &quot;ERROR&quot;. The temporary/permanent error distinction is primarily relevant to well-behaving automated clients. Basic clients may also choose not to support client-certificate authentication, in which case only four distinct status handling routines are required (for statuses beginning with 1, 2, 3 or a combined 4-or-5).</p>
+<p>&nbsp;</p>
+<p>The full two-digit system is detailed in Appendix 1. Note that for each of the six valid first digits, a code with a second digit of zero corresponds is a generic status of that kind with no special semantics. This means that basic servers without any advanced functionality need only be able to return codes of 10, 20, 30, 40 or 50.</p>
+<p>&nbsp;</p>
+<p>The Gemini status code system has been carefully designed so that the increased power (and correspondingly increased complexity) of the second digits is entirely &quot;opt-in&quot; on the part of both servers and clients.</p>
+<p>&nbsp;</p>
+<h2>3.3 Response bodies</h2>
+<p>&nbsp;</p>
+<p>Response bodies are just raw content, text or binary, ala gopher. There is no support for compression, chunking or any other kind of content or transfer encoding. The server closes the connection after the final byte, there is no &quot;end of response&quot; signal like gopher&apos;s lonely dot.</p>
+<p>&nbsp;</p>
+<p>Response bodies only accompany responses whose header indicates a SUCCESS status (i.e. a status code whose first digit is 2). For such responses, &lt;META&gt; is a MIME media type as defined in RFC 2046.</p>
+<p>&nbsp;</p>
+<p>Internet media types are registered with a canonical form. Content transferred via Gemini MUST be represented in the appropriate canonical form prior to its transmission except for &quot;text&quot; types, as defined in the next paragraph.</p>
+<p>&nbsp;</p>
+<p>When in canonical form, media subtypes of the &quot;text&quot; type use CRLF as the text line break. Gemini relaxes this requirement and allows the transport of text media with plain LF alone (but NOT a plain CR alone) representing a line break when it is done consistently for an entire response body. Gemini clients MUST accept CRLF and bare LF as being representative of a line break in text media received via Gemini.</p>
+<p>&nbsp;</p>
+<p>If a MIME type begins with &quot;text/&quot; and no charset is explicitly given, the charset should be assumed to be UTF-8. Compliant clients MUST support UTF-8-encoded text/* responses. Clients MAY optionally support other encodings. Clients receiving a response in a charset they cannot decode SHOULD gracefully inform the user what happened instead of displaying garbage.</p>
+<p>&nbsp;</p>
+<p>If &lt;META&gt; is an empty string, the MIME type MUST default to &quot;text/gemini; charset=utf-8&quot;. The text/gemini media type is defined in section 5.</p>
+<p>&nbsp;</p>
+<h2>3.4 Response body handling</h2>
+<p>&nbsp;</p>
+<p>Response handling by clients should be informed by the provided MIME type information. Gemini defines one MIME type of its own (text/gemini) whose handling is discussed below in section 5. In all other cases, clients should do &quot;something sensible&quot; based on the MIME type. Minimalistic clients might adopt a strategy of printing all other text/* responses to the screen without formatting and saving all non-text responses to the disk. Clients for unix systems may consult /etc/mailcap to find installed programs for handling non-text types.</p>
+<p>&nbsp;</p>
+<h1>4 TLS</h1>
+<p>&nbsp;</p>
+<p>Use of TLS for Gemini transactions is mandatory.</p>
+<p>&nbsp;</p>
+<p>Use of the Server Name Indication (SNI) extension to TLS is also mandatory, to facilitate name-based virtual hosting.</p>
+<p>&nbsp;</p>
+<h2>4.1 Version requirements</h2>
+<p>&nbsp;</p>
+<p>Servers MUST use TLS version 1.2 or higher and SHOULD use TLS version 1.3 or higher. TLS 1.2 is reluctantly permitted for now to avoid drastically reducing the range of available implementation libraries. Hopefully TLS 1.3 or higher can be specced in the near future. Clients who wish to be &quot;ahead of the curve MAY refuse to connect to servers using TLS version 1.2 or lower.</p>
+<p>&nbsp;</p>
+<h2>4.2 Server certificate validation</h2>
+<p>&nbsp;</p>
+<p>Clients can validate TLS connections however they like (including not at all) but the strongly RECOMMENDED approach is to implement a lightweight &quot;TOFU&quot; certificate-pinning system which treats self-signed certificates as first- class citizens. This greatly reduces TLS overhead on the network (only one cert needs to be sent, not a whole chain) and lowers the barrier to entry for setting up a Gemini site (no need to pay a CA or setup a Let&apos;s Encrypt cron job, just make a cert and go).</p>
+<p>&nbsp;</p>
+<p>TOFU stands for &quot;Trust On First Use&quot; and is public-key security model similar to that used by OpenSSH. The first time a Gemini client connects to a server, it accepts whatever certificate it is presented. That certificate&apos;s fingerprint and expiry date are saved in a persistent database (like the .known_hosts file for SSH), associated with the server&apos;s hostname. On all subsequent connections to that hostname, the received certificate&apos;s fingerprint is computed and compared to the one in the database. If the certificate is not the one previously received, but the previous certificate&apos;s expiry date has not passed, the user is shown a warning, analogous to the one web browser users are shown when receiving a certificate without a signature chain leading to a trusted CA.</p>
+<p>&nbsp;</p>
+<p>This model is by no means perfect, but it is not awful and is vastly superior to just accepting self-signed certificates unconditionally.</p>
+<p>&nbsp;</p>
+<h2>4.3 Client certificates</h2>
+<p>&nbsp;</p>
+<p>Although rarely seen on the web, TLS permits clients to identify themselves to servers using certificates, in exactly the same way that servers traditionally identify themselves to the client. Gemini includes the ability for servers to request in-band that a client repeats a request with a client certificate. This is a very flexible, highly secure but also very simple notion of client identity with several applications:</p>
+<p>&nbsp;</p>
+<ul>
+<li>Short-lived client certificates which are generated on demand and deleted immediately after use can be used as &quot;session identifiers&quot; to maintain server-side state for applications. In this role, client certificates act as a substitute for HTTP cookies, but unlike cookies they are generated voluntarily by the client, and once the client deletes a certificate and its matching key, the server cannot possibly &quot;resurrect&quot; the same value later (unlike so-called &quot;super cookies&quot;).
+<li>Long-lived client certificates can reliably identify a user to a multi-user application without the need for passwords which may be brute-forced. Even a stolen database table mapping certificate hashes to user identities is not a security risk, as rainbow tables for certificates are not feasible.
+<li>Self-hosted, single-user applications can be easily and reliably secured in a manner familiar from OpenSSH: the user generates a self-signed certificate and adds its hash to a server-side list of permitted certificates, analogous to the .authorized_keys file for SSH).
+</ul>
+<p>&nbsp;</p>
+<p>Gemini requests will typically be made without a client certificate. If a requested resource requires a client certificate and one is not included in a request, the server can respond with a status code of 60, 61 or 62 (see Appendix 1 below for a description of all status codes related to client certificates). A client certificate which is generated or loaded in response to such a status code has its scope bound to the same hostname as the request URL and to all paths below the path of the request URL path. E.g. if a request for gemini:<em>example.com/foo returns status 60 and the user chooses to generate a new client certificate in response to this, that same certificate should be used for subsequent requests to gemini:</em>example.com/foo, gemini:<em>example.com/foo/bar/, gemini:</em>example.com/foo/bar/baz, etc., until such time as the user decides to delete the certificate or to temporarily deactivate it. Interactive clients for human users are strongly recommended to make such actions easy and to generally give users full control over the use of client certificates.</p>
+<p>&nbsp;</p>
+<h1>5 The text/gemini media type</h1>
+<p>&nbsp;</p>
+<h2>5.1 Overview</h2>
+<p>&nbsp;</p>
+<p>In the same sense that HTML is the &quot;native&quot; response format of HTTP and plain text is the native response format of gopher, Gemini defines its own native response format - though of course, thanks to the inclusion of a MIME type in the response header Gemini can be used to serve plain text, rich text, HTML, Markdown, LaTeX, etc.</p>
+<p>&nbsp;</p>
+<p>Response bodies of type &quot;text/gemini&quot; are a kind of lightweight hypertext format, which takes inspiration from gophermaps and from Markdown. The format permits richer typographic possibilities than the plain text of Gopher, but remains extremely easy to parse. The format is line-oriented, and a satisfactory rendering can be achieved with a single pass of a document, processing each line independently. As per gopher, links can only be displayed one per line, encouraging neat, list-like structure.</p>
+<p>&nbsp;</p>
+<p>Similar to how the two-digit Gemini status codes were designed so that simple clients can function correctly while ignoring the second digit, the text/gemini format has been designed so that simple clients can ignore the more advanced features and still remain very usable.</p>
+<p>&nbsp;</p>
+<h2>5.2 Parameters</h2>
+<p>&nbsp;</p>
+<p>As a subtype of the top-level media type &quot;text&quot;, &quot;text/gemini&quot; inherits the &quot;charset&quot; parameter defined in RFC 2046. However, as noted in 3.3, the default value of &quot;charset&quot; is &quot;UTF-8&quot; for &quot;text&quot; content transferred via Gemini.</p>
+<p>&nbsp;</p>
+<p>A single additional parameter specific to the &quot;text/gemini&quot; subtype is defined: the &quot;lang&quot; parameter. The value of &quot;lang&quot; denotes the natural language or language(s) in which the textual content of a &quot;text/gemini&quot; document is written. The presence of the &quot;lang&quot; parameter is optional. When the &quot;lang&quot; parameter is present, its interpretation is defined entirely by the client. For example, clients which use text-to-speech technology to make Gemini content accessible to visually impaired users may use the value of &quot;lang&quot; to improve pronunciation of content. Clients which render text to a screen may use the value of &quot;lang&quot; to determine whether text should be displayed left-to-right or right-to-left. Simple clients for users who only read languages written left-to-right may simply ignore the value of &quot;lang&quot;. When the &quot;lang&quot; parameter is not present, no default value should be assumed and clients which require some notion of a language in order to process the content (such as text-to-speech screen readers) should rely on user-input to determine how to proceed in the absence of a &quot;lang&quot; parameter.</p>
+<p>&nbsp;</p>
+<p>Valid values for the &quot;lang&quot; parameter are comma-separated lists of one or more language tags as defined in RFC4646. For example:</p>
+<p>&nbsp;</p>
+<ul>
+<li>&quot;text/gemini; lang=en&quot; Denotes a text/gemini document written in English
+<li>&quot;text/gemini; lang=fr&quot; Denotes a text/gemini document written in French
+<li>&quot;text/gemini; lang=en,fr&quot; Denotes a text/gemini document written in a mixture of English and French
+<li>&quot;text/gemini; lang=de-CH&quot; Denotes a text/gemini document written in Swiss German
+<li>&quot;text/gemini; lang=sr-Cyrl&quot; Denotes a text/gemini document written in Serbian using the Cyrllic script
+<li>&quot;text/gemini; lang=zh-Hans-CN&quot; Denotes a text/gemini document written in Chinese using the Simplified script as used in mainland China
+</ul>
+<p>&nbsp;</p>
+<h2>5.3 Line-orientation</h2>
+<p>&nbsp;</p>
+<p>As mentioned, the text/gemini format is line-oriented. Each line of a text/gemini document has a single &quot;line type&quot;. It is possible to unambiguously determine a line&apos;s type purely by inspecting its first three characters. A line&apos;s type determines the manner in which it should be presented to the user. Any details of presentation or rendering associated with a particular line type are strictly limited in scope to that individual line.</p>
+<p>&nbsp;</p>
+<p>There are 7 different line types in total. However, a fully functional and specification compliant Gemini client need only recognise and handle 4 of them - these are the &quot;core line types&quot;, (see 5.4). Advanced clients can also handle the additional &quot;advanced line types&quot; (see 5.5). Simple clients can treat all advanced line types as equivalent to one of the core line types and still offer an adequate user experience.</p>
+<p>&nbsp;</p>
+<h2>5.4 Core line types</h2>
+<p>&nbsp;</p>
+<p>The four core line types are:</p>
+<p>&nbsp;</p>
+<h3>5.4.1 Text lines</h3>
+<p>&nbsp;</p>
+<p>Text lines are the most fundamental line type - any line which does not match the definition of another line type defined below defaults to being a text line. The majority of lines in a typical text/gemini document will be text lines.</p>
+<p>&nbsp;</p>
+<p>Text lines should be presented to the user, after being wrapped to the appropriate width for the client&apos;s viewport (see below). Text lines may be presented to the user in a visually pleasing manner for general reading, the precise meaning of which is at the client&apos;s discretion. For example, variable width fonts may be used, spacing may be normalised, with spaces between sentences being made wider than spacing between words, and other such typographical niceties may be applied. Clients may permit users to customise the appearance of text lines by altering the font, font size, text and background colour, etc. Authors should not expect to exercise any control over the precise rendering of their text lines, only of their actual textual content. Content such as ASCII art, computer source code, etc. which may appear incorrectly when treated as such should be enclosed between preformatting toggle lines (see 5.4.3).</p>
+<p>&nbsp;</p>
+<p>Blank lines are instances of text lines and have no special meaning. They should be rendered individually as vertical blank space each time they occur. In this way they are analogous to &lt;br/&gt; tags in HTML. Consecutive blank lines should NOT be collapsed into a fewer blank lines. Note also that consecutive non-blank text lines do not form any kind of coherent unit or block such as a &quot;paragraph&quot;: all text lines are independent entities.</p>
+<p>&nbsp;</p>
+<p>Text lines which are longer than can fit on a client&apos;s display device SHOULD be &quot;wrapped&quot; to fit, i.e. long lines should be split (ideally at whitespace or at hyphens) into multiple consecutive lines of a device-appropriate width. This wrapping is applied to each line of text independently. Multiple consecutive lines which are shorter than the client&apos;s display device MUST NOT be combined into fewer, longer lines.</p>
+<p>&nbsp;</p>
+<p>In order to take full advantage of this method of text formatting, authors of text/gemini content SHOULD avoid hard-wrapping to a specific fixed width, in contrast to the convention in Gopherspace where text is typically wrapped at 80 characters or fewer. Instead, text which should be displayed as a contiguous block should be written as a single long line. Most text editors can be configured to &quot;soft-wrap&quot;, i.e. to write this kind of file while displaying the long lines wrapped at word boundaries to fit the author&apos;s display device.</p>
+<p>&nbsp;</p>
+<p>Authors who insist on hard-wrapping their content MUST be aware that the content will display neatly on clients whose display device is as wide as the hard-wrapped length or wider, but will appear with irregular line widths on narrower clients.</p>
+<p>&nbsp;</p>
+<h3>5.4.2 Link lines</h3>
+<p>&nbsp;</p>
+<p>Lines beginning with the two characters &quot;=&gt;&quot; are link lines, which have the following syntax:</p>
+<p>&nbsp;</p>
+<pre alt=''>
+=&gt;[&lt;whitespace&gt;]&lt;URL&gt;[&lt;whitespace&gt;&lt;USER-FRIENDLY LINK NAME&gt;]
+</pre>
+<p>&nbsp;</p>
+<p>where:</p>
+<p>&nbsp;</p>
+<ul>
+<li>&lt;whitespace&gt; is any non-zero number of consecutive spaces or tabs
+<li>Square brackets indicate that the enclosed content is optional.
+<li>&lt;URL&gt; is a URL, which may be absolute or relative.
+</ul>
+<p>&nbsp;</p>
+<p>All the following examples are valid link lines:</p>
+<p>&nbsp;</p>
+<pre alt=''>
+=&gt; gemini://example.org/
+=&gt; gemini://example.org/ An example link
+=&gt; gemini://example.org/foo	Another example link at the same host
+=&gt; foo/bar/baz.txt	A relative link
+=&gt; 	gopher://example.org:70/1 A gopher link
+</pre>
+<p>&nbsp;</p>
+<p>URLs in link lines must have reserved characters and spaces percent-encoded as per RFC 3986.</p>
+<p>&nbsp;</p>
+<p>Note that link URLs may have schemes other than gemini. This means that Gemini documents can simply and elegantly link to documents hosted via other protocols, unlike gophermaps which can only link to non-gopher content via a non-standard adaptation of the `h` item-type.</p>
+<p>&nbsp;</p>
+<p>Clients can present links to users in whatever fashion the client author wishes, however clients MUST NOT automatically make any network connections as part of displaying links whose scheme corresponds to a network protocol (e.g. links beginning with gemini:<em>, gopher:</em>, https:<em>, ftp:</em> , etc.).</p>
+<p>&nbsp;</p>
+<h3>5.4.3 Preformatting toggle lines</h3>
+<p>&nbsp;</p>
+<p>Any line whose first three characters are &quot;```&quot; (i.e. three consecutive back ticks with no leading whitespace) are preformatted toggle lines. These lines should NOT be included in the rendered output shown to the user. Instead, these lines toggle the parser between preformatted mode being &quot;on&quot; or &quot;off&quot;. Preformatted mode should be &quot;off&quot; at the beginning of a document. The current status of preformatted mode is the only internal state a parser is required to maintain. When preformatted mode is &quot;on&quot;, the usual rules for identifying line types are suspended, and all lines should be identified as preformatted text lines (see 5.4.4).</p>
+<p>&nbsp;</p>
+<p>Preformatting toggle lines can be thought of as analogous to &lt;pre&gt; and &lt;/pre&gt; tags in HTML.</p>
+<p>&nbsp;</p>
+<p>Any text following the leading &quot;```&quot; of a preformat toggle line which toggles preformatted mode on MAY be interpreted by the client as &quot;alt text&quot; pertaining to the preformatted text lines which follow the toggle line. Use of alt text is at the client&apos;s discretion, and simple clients may ignore it. Alt text is recommended for ASCII art or similar non-textual content which, for example, cannot be meaningfully understood when rendered through a screen reader or usefully indexed by a search engine. Alt text may also be used for computer source code to identify the programming language which advanced clients may use for syntax highlighting.</p>
+<p>&nbsp;</p>
+<p>Any text following the leading &quot;```&quot; of a preformat toggle line which toggles preformatted mode off MUST be ignored by clients.</p>
+<p>&nbsp;</p>
+<h3>5.4.4 Preformatted text lines</h3>
+<p>&nbsp;</p>
+<p>Preformatted text lines should be presented to the user in a &quot;neutral&quot;, monowidth font without any alteration to whitespace or stylistic enhancements. Graphical clients should use scrolling mechanisms to present preformatted text lines which are longer than the client viewport, in preference to wrapping. In displaying preformatted text lines, clients should keep in mind applications like ASCII art and computer source code: in particular, source code in languages with significant whitespace (e.g. Python) should be able to be copied and pasted from the client into a file and interpreted/compiled without any problems arising from the client&apos;s manner of displaying them.</p>
+<p>&nbsp;</p>
+<h2>5.5 Advanced line types</h2>
+<p>&nbsp;</p>
+<p>The following advanced line types MAY be recognised by advanced clients. Simple clients may treat them all as text lines as per 5.4.1 without any loss of essential function.</p>
+<p>&nbsp;</p>
+<h3>5.5.1 Heading lines</h3>
+<p>&nbsp;</p>
+<p>Lines beginning with &quot;#&quot; are heading lines. Heading lines consist of one, two or three consecutive &quot;#&quot; characters, followed by optional whitespace, followed by heading text. The number of # characters indicates the &quot;level&quot; of header; #, ## and ### can be thought of as analogous to &lt;h1&gt;, &lt;h2&gt; and &lt;h3&gt; in HTML.</p>
+<p>&nbsp;</p>
+<p>Heading text should be presented to the user, and clients MAY use special formatting, e.g. a larger or bold font, to indicate its status as a header (simple clients may simply print the line, including its leading #s, without any styling at all). However, the main motivation for the definition of heading lines is not stylistic but to provide a machine-readable representation of the internal structure of the document. Advanced clients can use this information to, e.g. display an automatically generated and hierarchically formatted &quot;table of contents&quot; for a long document in a side-pane, allowing users to easily jump to specific sections without excessive scrolling. CMS-style tools automatically generating menus or Atom/RSS feeds for a directory of text/gemini files can use first</p>
+<p>heading in the file as a human-friendly title.</p>
+<p>&nbsp;</p>
+<h3>5.5.2 Unordered list items</h3>
+<p>&nbsp;</p>
+<p>Lines beginning with &quot;* &quot; are unordered list items. This line type exists purely for stylistic reasons. The * may be replaced in advanced clients by a bullet symbol. Any text after the &quot;* &quot; should be presented to the user as if it were a text line, i.e. wrapped to fit the viewport and formatted &quot;nicely&quot;. Advanced clients can take the space of the bullet symbol into account when wrapping long list items to ensure that all lines of text corresponding to the item are offset an equal distance from the left of the screen.</p>
+<p>&nbsp;</p>
+<h3>5.5.3 Quote lines</h3>
+<p>&nbsp;</p>
+<p>Lines beginning with &quot;&gt;&quot; are quote lines. This line type exists so that advanced clients may use distinct styling to convey to readers the important semantic information that certain text is being quoted from an external source. For example, when wrapping long lines to the the viewport, each resultant line may have a &quot;&gt;&quot; symbol placed at the front.</p>
+<p>&nbsp;</p>
+<h1>Appendix 1. Full two digit status codes</h1>
+<p>&nbsp;</p>
+<h2>10 INPUT</h2>
+<p>&nbsp;</p>
+<p>As per definition of single-digit code 1 in 3.2.</p>
+<p>&nbsp;</p>
+<h2>11 SENSITIVE INPUT</h2>
+<p>&nbsp;</p>
+<p>As per status code 10, but for use with sensitive input such as passwords. Clients should present the prompt as per status code 10, but the user&apos;s input should not be echoed to the screen to prevent it being read by &quot;shoulder surfers&quot;.</p>
+<p>&nbsp;</p>
+<h2>20 SUCCESS</h2>
+<p>&nbsp;</p>
+<p>As per definition of single-digit code 2 in 3.2.</p>
+<p>&nbsp;</p>
+<h2>30 REDIRECT - TEMPORARY</h2>
+<p>&nbsp;</p>
+<p>As per definition of single-digit code 3 in 3.2.</p>
+<p>&nbsp;</p>
+<h2>31 REDIRECT - PERMANENT</h2>
+<p>&nbsp;</p>
+<p>The requested resource should be consistently requested from the new URL provided in future. Tools like search engine indexers or content aggregators should update their configurations to avoid requesting the old URL, and end-user clients may automatically update bookmarks, etc. Note that clients which only pay attention to the initial digit of status codes will treat this as a temporary redirect. They will still end up at the right place, they just won&apos;t be able to make use of the knowledge that this redirect is permanent, so they&apos;ll pay a small performance penalty by having to follow the redirect each time.</p>
+<p>&nbsp;</p>
+<h2>40 TEMPORARY FAILURE</h2>
+<p>&nbsp;</p>
+<p>As per definition of single-digit code 4 in 3.2.</p>
+<p>&nbsp;</p>
+<h2>41 SERVER UNAVAILABLE</h2>
+<p>&nbsp;</p>
+<p>The server is unavailable due to overload or maintenance. (cf HTTP 503)</p>
+<p>&nbsp;</p>
+<h2>42 CGI ERROR</h2>
+<p>&nbsp;</p>
+<p>A CGI process, or similar system for generating dynamic content, died unexpectedly or timed out.</p>
+<p>&nbsp;</p>
+<h2>43 PROXY ERROR</h2>
+<p>&nbsp;</p>
+<p>A proxy request failed because the server was unable to successfully complete a transaction with the remote host. (cf HTTP 502, 504)</p>
+<p>&nbsp;</p>
+<h2>44 SLOW DOWN</h2>
+<p>&nbsp;</p>
+<p>Rate limiting is in effect. &lt;META&gt; is an integer number of seconds which the client must wait before another request is made to this server. (cf HTTP 429)</p>
+<p>&nbsp;</p>
+<h2>50 PERMANENT FAILURE</h2>
+<p>&nbsp;</p>
+<p>As per definition of single-digit code 5 in 3.2.</p>
+<p>&nbsp;</p>
+<h2>51 NOT FOUND</h2>
+<p>&nbsp;</p>
+<p>The requested resource could not be found but may be available in the future. (cf HTTP 404) (struggling to remember this important status code? Easy: you can&apos;t find things hidden at Area 51!)</p>
+<p>&nbsp;</p>
+<h2>52 GONE</h2>
+<p>&nbsp;</p>
+<p>The resource requested is no longer available and will not be available again. Search engines and similar tools should remove this resource from their indices. Content aggregators should stop requesting the resource and convey to their human users that the subscribed resource is gone. (cf HTTP 410)</p>
+<p>&nbsp;</p>
+<h2>53 PROXY REQUEST REFUSED</h2>
+<p>&nbsp;</p>
+<p>The request was for a resource at a domain not served by the server and the server does not accept proxy requests.</p>
+<p>&nbsp;</p>
+<h2>59 BAD REQUEST</h2>
+<p>&nbsp;</p>
+<p>The server was unable to parse the client&apos;s request, presumably due to a malformed request. (cf HTTP 400)</p>
+<p>&nbsp;</p>
+<h2>60 CLIENT CERTIFICATE REQUIRED</h2>
+<p>&nbsp;</p>
+<p>As per definition of single-digit code 6 in 3.2.</p>
+<p>&nbsp;</p>
+<h2>61 CERTIFICATE NOT AUTHORISED</h2>
+<p>&nbsp;</p>
+<p>The supplied client certificate is not authorised for accessing the particular requested resource. The problem is not with the certificate itself, which may be authorised for other resources.</p>
+<p>&nbsp;</p>
+<h2>62 CERTIFICATE NOT VALID</h2>
+<p>&nbsp;</p>
+<p>The supplied client certificate was not accepted because it is not valid. This indicates a problem with the certificate in and of itself, with no consideration of the particular requested resource. The most likely cause is that the certificate&apos;s validity start date is in the future or its expiry date has passed, but this code may also indicate an invalid signature, or a violation of a X509 standard requirements. The &lt;META&gt; should provide more information about the exact error.</p>
--- a/tests/files_with_html/titles-utf-16.gmi
+++ b/tests/files_with_html/titles-utf-16.gmi
--- a/tests/files_with_html/titles-utf-16.gmi.html
+++ b/tests/files_with_html/titles-utf-16.gmi.html
@ -0,0 +1,5 @@
+<h1>First level title</h1>
+<p>&nbsp;</p>
+<h2>Second level title</h2>
+<h3>Third level title</h3>
+<h3># Fourth but wrong level title</h3>
--- a/tests/files_with_html/titles.gmi
+++ b/tests/files_with_html/titles.gmi
@ -0,0 +1,5 @@
+# First level title
+
+## Second level title
+### Third level title
+### # Fourth but wrong level title
--- a/tests/files_with_html/titles.gmi.html
+++ b/tests/files_with_html/titles.gmi.html
@ -0,0 +1,5 @@
+<h1>First level title</h1>
+<p>&nbsp;</p>
+<h2>Second level title</h2>
+<h3>Third level title</h3>
+<h3># Fourth but wrong level title</h3>
--- a/tests/index.gmi
+++ b/tests/index.gmi
@ -0,0 +1,22 @@
+# Tests
+
+This test suite aims to check the quality of the translation of the fulltext source pages to HTML pages.
+
+In the HtmGem root directory, just type **phpunit tests** to check.
+
+### parserTest.php
+The parser takes the gemtext contained in the **.gmi** file and returns the internal format.
+
+### translateToHtmlTest.php
+The translator takes the gemtext, parse it and returns an HTML representation.
+
+### translateToGemtextTest.php
+This translator takes the gemtext, parse it and rebuild the gemtext. The output must be identical of the input. When there are optional spaces, they are trimmed. In such cases, the translation will return a kind of normalised gemtext — that is, without the spaces.
+
+### cli
+The directory contains files to play around with the API:
+* parse_gemtext: outputs a //print_r()// of the internal representation
+* translate_to_gemtext: outputs the regenerated gemtext
+* translate_to_html: outputs the html-ified gemtext
+
+//pre-commit.git// is to be moved to //./.git/hooks/pre-commit// and //chmod x//-ed to run this test suit before committing.
--- a/tests/miscTest.php
+++ b/tests/miscTest.php
@ -0,0 +1,140 @@
+<?php declare(strict_types=1);
+use PHPUnit\Framework\TestCase;
+
+require_once dirname(__FILE__)."/../lib-htmgem.inc.php";
+
+final class miscTest extends TestCase {
+
+    public function test_split_path_links(): void {
+        $this->assertSame(
+            array(),
+            \htmgem\split_path_links(""),
+            "empty link"
+        );
+        $this->assertSame(
+            array(
+                "noslash" => "noslash",
+            ),
+            \htmgem\split_path_links("noslash"),
+            "no slash"
+        );
+        $this->assertSame(
+            array(),
+            \htmgem\split_path_links("/"),
+            "only a slash"
+        );
+        $this->assertSame(
+            array(
+                "one" => "/one",
+            ),
+            \htmgem\split_path_links("/one"),
+            "/one"
+        );
+        $this->assertSame(
+            array(
+                "one" => "one",
+                "two" => "one/two",
+            ),
+            \htmgem\split_path_links("one/two"),
+            "one/two"
+        );
+        $this->assertSame(
+            array(
+                "one" => "/one",
+                "two" => "/one/two",
+                "file.ext" => "/one/two/file.ext",
+            ),
+            \htmgem\split_path_links("/one/two/file.ext"),
+            "/one/two/file.ext"
+        );
+    }
+
+    public function test_resolve_path(): void {
+        $this->assertSame(
+            \htmgem\resolve_path(""),
+            "",
+            "empty link"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("test"),
+            "test",
+            "single word"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path(" "),
+            " ",
+            "single space"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path(" A B "),
+            " A B ",
+            "several space"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("/"),
+            "/",
+            "one slash"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("//"),
+            "/",
+            "two slashes"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("/////"),
+            "/",
+            "five slashes"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("one/"),
+            "one",
+            "strip the last slash"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("/two"),
+            "/two",
+            "slash at the beginning"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("/two/"),
+            "/two",
+            "slash at the beginning and the end"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("one/two/"),
+            "one/two",
+            "only the last slash remains"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("one/two/three//"),
+            "one/two/three",
+            "strip the last slashes"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("one/../"),
+            "",
+            "empty one"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("one/two/../"),
+            "one",
+            "empty one two"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("one/two/../.."),
+            "",
+            "empty one two twice"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("one/../two/./../three"),
+            "three",
+            "waltz"
+        );
+        $this->assertSame(
+            \htmgem\resolve_path("one/../.."),
+            "/",
+            "directory traversal"
+        );
+    }
+
+}
--- a/tests/parserTest.php
+++ b/tests/parserTest.php
@ -0,0 +1,455 @@
+<?php declare(strict_types=1);
+use PHPUnit\Framework\TestCase;
+
+require_once dirname(__FILE__)."/../lib-htmgem.inc.php";
+
+function parse($text): array {
+    return iterator_to_array(\htmgem\gemtextParser($text));
+}
+
+
+final class parserTest extends TestCase {
+
+    public function test_gemtextParser_textLines(): void {
+        $this->assertSame(
+            array(),
+            parse(null),
+            "  Null line"
+        );
+        $this->assertSame(
+            array(),
+            parse(""),
+            "  Empty line"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"", "text"=>"  foo"),
+            ),
+            parse("  foo    "),
+            "  A text surrounded by spaces"
+        );
+        $this->assertSame(
+            array(
+                array(
+                    "mode"=>"",
+                    "text"=>"  bar"
+               ),
+            ),
+            parse("  bar  \n"),
+            "  A text with a line feed after"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"", "text"=>"foo"),
+                array("mode"=>"", "text"=>"bar"),
+            ),
+            parse("foo\nbar"),
+            "  Two lines of text"
+        );
+    }
+
+    public function test_gemtextParser_link(): void {
+        $this->assertSame(
+            array(
+                array("mode"=>"=>", "link"=>"", "text"=>"")
+            ),
+            parse("=>"),
+            "=> A single equal-greaterthan"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"=>", "link"=>"https://www.test.com", "text"=>"")
+            ),
+            parse("=> https://www.test.com"),
+            "=> A normal link with no text"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"=>", "link"=>"https://www.test.com", "text"=>"")
+            ),
+            parse("=>   https://www.test.com"),
+            "=> A link with spaces between the egal-greaterthan and the link"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"=>", "link"=>"https://www.test.com", "text"=>"text of the link")
+            ),
+            parse("=> https://www.test.com text of the link      "),
+            "=> A link with a text with spaces at the end"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"", "text"=>" => https://www.test.com text of the link")
+            ),
+            parse(" => https://www.test.com text of the link"),
+            "=> A link with a space before, seen like a text line"
+        );
+    }
+
+    public function test_gemtextParser_titles(): void {
+        $this->assertSame(
+            array(
+                array("mode"=>"#", "title"=>"")
+            ),
+            parse("#"),
+            "# Single sharp with no data"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"#", "title"=>"")
+            ),
+            parse("# "),
+            "# A sharp with one space after"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"#", "title"=>"the title")
+            ),
+            parse("# the title"),
+            "# A normal title"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"#", "title"=>"the  title")
+            ),
+            parse("#    the  title"),
+            "# A title with spaces right after the sharp"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"##", "title"=>"Second level title")
+            ),
+            parse("## Second level title"),
+            "## Level two"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"###", "title"=>"Third level title")
+            ),
+            parse("### Third level title"),
+            "### Level three"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"###", "title"=>"# Fourth level title")
+            ),
+            parse("#### Fourth level title"),
+            "### Level fourth"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"###", "title"=>"#")
+            ),
+            parse("####"),
+            "### Level fourth without space"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"###", "title"=>"##")
+            ),
+            parse("#####"),
+            "### Level five without space"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"###", "title"=>"##")
+            ),
+            parse("#####        "),
+            "### Level five with spaces and a tabulation at the end"
+        );
+    }
+
+    public function test_gemtextParser_lists(): void {
+        $this->assertSame(
+            array(
+                array("mode"=>"*", "texts"=>
+                    array("")
+                )
+            ),
+            parse("*"),
+            "* An item with no data, just the star on a line"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"*", "texts"=>
+                    array("")
+                )
+            ),
+            parse("*   "),
+            "* An item with not content but spaces"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"*", "texts"=>
+                    array("hello")
+                )
+            ),
+            parse("* hello"),
+            "* A single item"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"*", "texts"=>
+                    array("hello")
+                )
+            ),
+            parse("*hello"),
+            "* An item with no space right after the star"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"*", "texts"=>
+                    array("hello")
+                )
+            ),
+            parse("*   hello"),
+            "* An item with spaces before"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"*", "texts"=>
+                    array(
+                        "hello",
+                        "how",
+                        "are you?"
+                    )
+                )
+            ),
+            parse("* hello\n" . "*  how\n" . "*are you?"),
+            "* Several list items"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"*", "texts"=>
+                    array(
+                        "hello",
+                        "how",
+                        "are"
+                    )
+                ),
+                array("mode"=>"", "text"=>"you?")
+            ),
+            parse("* hello\n" . "*  how\n" . "*are\n" . "you?"),
+            "* Several list items, text then an item"
+        );
+    }
+
+    public function test_gemtextParser_preformated(): void {
+        $this->assertSame(
+            array(
+                array("mode"=>"```", "alt"=>"", "texts"=>
+                    array(
+                    )
+                )
+            ),
+            parse("```"),
+            "``` A preformated text with just the block, nothing after"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"```", "alt"=>"alternative text", "texts"=>
+                    array(
+                    )
+                )
+            ),
+            parse("```alternative text"),
+            "``` A preformated text with just the alternative text"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"```", "alt"=>"alternative text", "texts"=>
+                    array(
+                    )
+                )
+            ),
+            parse("```   alternative text  "),
+            "``` A preformated text with just the alternative text surrounded by spaces"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"```", "alt"=>"", "texts"=>
+                    array("  first line")
+                )
+            ),
+            parse("```   \n" . "  first line   "),
+            "``` A preformated text with one line"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"```", "alt"=>"", "texts"=>
+                    array(
+                        "  first line"
+                    )
+                )
+            ),
+            parse("```   \n" . "  first line  \n"),
+            "``` A preformated text with one line and a line feed"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"```", "alt"=>"", "texts"=>
+                    array(
+                        "  first line",
+                        "second line"
+                    )
+                )
+            ),
+            parse("```   \n" . "  first line  \n" . "second line"),
+            "``` A preformated text with two lines separated by one line feed"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"```", "alt"=>"", "texts"=>
+                    array(
+                    )
+                )
+            ),
+            parse("```   \n" . "``` text to ignore"),
+            "``` A preformated text made of two backticks lines one after the other"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"```", "alt"=>"", "texts"=>
+                    array(
+                        ""
+                    )
+                )
+            ),
+            parse("```   \n" . "   \n" ."``` text to ignore"),
+            "``` A preformated text made of two backticks lines separated by a blank line"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"```", "alt"=>"", "texts"=>
+                    array(
+                    )
+                )
+            ),
+            parse("```   \n" . "``` text to ignore\n"),
+            "``` A preformated text made of two backticks lines followed by a line feed"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"```", "alt"=>"", "texts"=>
+                    array(
+                        " one line  of    preformated text"
+                    )
+                ),
+            ),
+            parse("```   \n" . " one line  of    preformated text  \n" . "``` text to ignore"),
+            "``` A preformated text with one line"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"```", "alt"=>"", "texts"=>
+                    array(
+                        " two lines  of",
+                        "         preformated text"
+                    )
+                ),
+            ),
+            parse("```\n" . " two lines  of  \n" . "         preformated text\n" . "``` text to ignore"),
+            "``` A preformated text with two lines"
+        );
+    }
+
+    public function test_gemtextParser_quoted(): void {
+        $this->assertSame(
+            array(
+                array("mode"=>">", "texts"=>
+                    array(
+                        ""
+                    )
+                )
+            ),
+            parse(">"),
+            "> Quotation text with just greaterthan, nothing after"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>">", "texts"=>
+                    array(
+                        "the quoted text"
+                    )
+                )
+            ),
+            parse(">   the quoted text   "),
+            "> Normal quotation text"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>">", "texts"=>
+                    array(
+                        "the quoted text"
+                    )
+                )
+            ),
+            parse(">the quoted text   "),
+            "> Normal quotation text, no space after the greaterthan"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>">", "texts"=>
+                    array(
+                        "",
+                        "",
+                        ""
+                    )
+                )
+            ),
+            parse(">   \n> \n>   "),
+            "> Quotation text three times spaces"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>">", "texts"=>
+                    array(
+                        "A",
+                        "B",
+                        "C"
+                    )
+                )
+            ),
+            parse("> A  \n>B \n>    C "),
+            "> Quotation text three times letters with spaces"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>">", "texts"=>
+                    array(
+                        "A",
+                        "",
+                        "C"
+                    )
+                )
+            ),
+            parse(">A  \n>\n>    C "),
+            "> One group of three withn an empty one in the middle"
+        );
+    }
+
+    public function test_gemtextParser_textdecoration(): void {
+        $this->assertSame(
+            array(
+                array("mode"=>"^^^")
+            ),
+            parse("^^^"),
+            "^^^ Normal text decoration toggle"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"", "text"=>" ^^^ Text decoration toggle with a space before")
+            ),
+            parse(" ^^^ Text decoration toggle with a space before"),
+            "^^^ Text decoration toggle with a space before"
+        );
+        $this->assertSame(
+            array(
+                array("mode"=>"^^^"),
+                array("mode"=>"^^^")
+            ),
+            parse("^^^ Text decoration toggle without a space before\n" . "^^^"),
+            "^^^ Text decoration toggle without a space before"
+        );
+    }
+}
--- a/tests/test.gmi
+++ b/tests/test.gmi
@ -1,74 +0,0 @@
-# Test page
-
-## h2
-### h3
-#### h4 (Should be read as h3)
-
-
-Below, headers with spaces at the beginning of the line:
- # Missed h1, there's a space as a first character.
-  # Missed h2, there are two spaces as first characters.
-
-# Below this line, three empty lines:
-
-
-
-# Two long lines, without empty line in-between:
-Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
-Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?
-
-# Some links
-=> Link_without_label
-=> Link link with label
-=>Link_without_label
-=>Link link with label
-
-
-### Links made of only dots, and spaces after
-=> . 
-=> ..  
-=> ...   
-
-# Preformatted text
-``` Preformatted text
-At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat.
-
-< apt install cowsay >
- --------------------
-        \   ^__^
-         \  (oo)\_______
-            (__)\       )\/\
-                ||----w |
-                ||     ||
-
-
-```
-# Quotations
-
-## Two consecutives lines of quotations
-> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
-> Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
-
-## Two separated lines of quotations
-> Quotations: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
-
-> Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
-
-##One line of empty quotation:
->
-
-##Two lines of empty quotations:
->
->
-
-# Unordered list
-
-* line one
-*line two
-
-* another first line
-*another second line
-
-*single line
-
-
--- a/tests/translateToGemtextTest.php
+++ b/tests/translateToGemtextTest.php
@ -0,0 +1,65 @@
+<?php declare(strict_types=1);
+use PHPUnit\Framework\TestCase;
+
+$dirname_file = dirname(__FILE__);
+require_once "$dirname_file/../lib-htmgem.inc.php";
+require_once "$dirname_file/utils.inc.php";
+require_once "$dirname_file/../lib-io.inc.php";
+
+function translate($text): string {
+    return strval(new htmgem\GemtextTranslate_gemtext($text));
+}
+
+final class translateToGemtextTest extends TestCase {
+
+    public function test_translate_gemtext_smallTextSets(): void {
+        $line1 = "  Hello, how are you? ";
+        $line2 = " Nice to meet you!    ";
+        $rline1 = rtrim($line1);
+        $rline2 = rtrim($line2);
+        $this->assertSame(
+            "",
+            translate(null),
+            "Null line"
+        );
+        $this->assertSame(
+            "",
+            translate(""),
+            "Empty line"
+        );
+        $this->assertSame(
+            "$rline1\n",
+            translate($line1),
+            "Only one line"
+        );
+        $this->assertSame(
+            "$rline1\n",
+            translate($line1),
+            "Only one line with a line feed"
+        );
+        $this->assertSame(
+            "$rline1\n$rline2\n",
+            translate("$line1\n$line2"),
+            "Two lines, one line feed in between"
+        );
+        $this->assertSame(
+            "$rline1\n$rline2\n",
+            translate("$line1\n$line2"),
+            "Two lines, one line feed after each"
+        );
+    }
+
+    #TODO: don't stop when problems are found, list all the faulty files
+    public function test_translate_gemtext_files(): void {
+        foreach(getFiles(dirname(__FILE__)."/..", "gmi") as $filePathname) {
+            $fileContent = file_get_contents($filePathname);
+            \htmgem\io\convertToUTF8($fileContent);
+            $this->assertSame(
+                $fileContent,
+                translate($fileContent),
+                "The same file without extra space translated into itself: $filePathname"
+            );
+        }
+    }
+
+}
--- a/tests/translateToHtmlTest.php
+++ b/tests/translateToHtmlTest.php
@ -0,0 +1,125 @@
+<?php declare(strict_types=1);
+use PHPUnit\Framework\TestCase;
+
+$dirname_file = dirname(__FILE__);
+require_once "$dirname_file/../lib-htmgem.inc.php";
+require_once "$dirname_file/utils.inc.php";
+require_once "$dirname_file/../lib-io.inc.php";
+
+function translateHtml($text): string {
+    $gt_html = new htmgem\GemtextTranslate_html($text);
+    return strval($gt_html->translatedGemtext);
+}
+
+final class translateToHtmlTest extends TestCase {
+
+    protected static function noSeveralSpaces($text): string {
+        # Replaces several spaces (0x20) by only one
+        return preg_replace("/  +/", " ", $text);
+    }
+
+    public function test_translate_gemtext_smallTextSets(): void {
+        $line1 = "  Hello, how are you? ";
+        $line2 = " Nice to meet you!    ";
+        $rline1 = self::noSeveralSpaces(rtrim($line1));
+        $rline2 = self::noSeveralSpaces(rtrim($line2));
+        $this->assertSame(
+            "",
+            translateHtml(null),
+            "Null line"
+        );
+        $this->assertSame(
+            "",
+            translateHtml(""),
+            "Empty line"
+        );
+        $this->assertSame(
+            "<p>$rline1</p>\n",
+            translateHtml($line1),
+            "Only one line"
+        );
+        $this->assertSame(
+            "<p>$rline1</p>\n",
+            translateHtml($line1."\n"),
+            "Only one line with a line feed"
+        );
+        $this->assertSame(
+            "<p>$rline1</p>\n<p>$rline2</p>\n",
+            translateHtml($line1."\n".$line2),
+            "Two lines, one line feed in between"
+        );
+        $this->assertSame(
+            "<p>$rline1</p>\n<p>$rline2</p>\n",
+            translateHtml("$line1\n$line2\n"),
+            "Two lines, one line feed after each"
+        );
+    }
+
+    public function test_decoration(): void {
+        $this->assertSame(
+            "<p>:<strong></strong></p>\n",
+            translateHtml(":**"),
+            "** Empty strong: the strongness goes until the end"
+        );
+        $this->assertSame(
+            "<p>:<strong>**</strong></p>\n",
+            translateHtml(":****"),
+            "** Two stars"
+        );
+        $this->assertSame(
+            "<p>:<strong>ok</strong></p>\n",
+            translateHtml(":**ok**"),
+            "** normal case with a word"
+        );
+        $this->assertSame(
+            "<p>:<strong>nice</strong></p>\n",
+            translateHtml(":**nice"),
+            "** a word with no end stars"
+        );
+        $this->assertSame(
+            "<p>:<strong>nice</strong></p>\n",
+            translateHtml(":**nice"),
+            "** a word with no end stars"
+        );
+        $this->assertSame(
+            "<p>:<strong>**one two three</strong></p>\n",
+            translateHtml(":****one two three"),
+            "** a word with no end stars"
+        );
+    }
+
+
+    #TODO: don't stop when problems are found, list all the faulty files
+    public function test_translate_html_files_with_html(): void {
+        /** NOTE: the UTF-16 files must result in the same content as UTF-8 ones.
+         * command to convert from UTF-8 to UTF-16: iconv -f utf8 -r utf16 text.gmi
+         */
+        foreach(getFiles(dirname(__FILE__)."/files_with_html", "gmi") as $filePathname) {
+            $fileContentGmi = file_get_contents($filePathname);
+            \htmgem\io\convertToUTF8($fileContentGmi);
+            $fileContentHtml = file_get_contents($filePathname.".html");
+            $this->assertSame(
+                $fileContentHtml,
+                translateHtml($fileContentGmi),
+                "Translation to HTML: $filePathname"
+            );
+        }
+    }
+
+    public function test_line_feeds(): void {
+        /** NOTE: the UTF-16 files must result in the same content as UTF-8 ones.
+         * command to convert from UTF-8 to UTF-16: iconv -f utf8 -r utf16 text.gmi
+         */
+        foreach(getFiles(dirname(__FILE__)."/files_with_html", "txt") as $filePathname) {
+            $fileContentGmi = file_get_contents($filePathname);
+            \htmgem\io\convertToUTF8($fileContentGmi);
+            $fileContentHtml = file_get_contents($filePathname.".html");
+            $this->assertSame(
+                $fileContentHtml,
+                translateHtml($fileContentGmi),
+                "Line feeds, translation to HTML: $filePathname"
+            );
+        }
+    }
+
+}
--- a/tests/utils.inc.php
+++ b/tests/utils.inc.php
@ -0,0 +1,22 @@
+<?php declare(strict_types=1);
+
+function getFiles($directory, $targetExtension): generator {
+    $flags =
+        FilesystemIterator::KEY_AS_PATHNAME
+      | FilesystemIterator::CURRENT_AS_FILEINFO
+      | FilesystemIterator::SKIP_DOTS
+    ;
+    #TODO: Prevent preloading of symlinks. Otherwise it keeps loading instead of not
+    # going into.
+    #TODO: Prevent going into .git/ by browsing "manually" instead of RecursiveIteratorIterator
+    $dir = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($directory, $flags));
+    foreach ($dir as $fileinfo) {
+        $filename = $fileinfo->getFilename();
+        $filePathname = $fileinfo->getPathname();
+        $extension = $fileinfo->getExtension();
+        if ($targetExtension == $extension) {
+            yield $filePathname;
+        }
+    }
+}
+
Author	SHA1	Message	Date
Christophe HENRY	5cfe0fabee	Adds Lazarus/omarpolo to the style sample	2023-08-01 22:31:03 +02:00
Christophe HENRY	a73614ffcc	Merge branch 'dev'	2023-08-01 22:24:46 +02:00
Christophe HENRY	61ba2a9502	Merge branch 'master' into dev	2023-08-01 22:21:49 +02:00
Christophe HENRY	511abab032	Fixes a bug on CSS when hosted on filesystem root	2023-08-01 22:14:29 +02:00
Lazarus	4eb97465a0	Add a cool arrow to gmi links!	2023-08-01 22:04:12 +02:00
Christophe HENRY	8626aaf3e5	Moves omarpolo.css into Lazarus' directory	2023-08-01 22:02:03 +02:00
Lazarus	fa8538cd92	Add style from Omar Polo's site	2023-08-01 21:57:01 +02:00
Lazarus	6ba8cceb3d	Fix getCss() calls Make ->getCss a method call instead of a property access so that it won't return null always.	2023-08-01 21:56:28 +02:00
Christophe HENRY	fb8134d3e0	Fix typo in the English documentation about Apache	2022-10-27 23:13:43 +02:00
Christophe HENRY	536c8101c5	Updates CHANGELOG and index.gmi	2022-08-23 10:50:17 +02:00
Christophe HENRY	41d1e3b289	v1.5.0 * Adds the Lagrange style, thanks to Eric <ortie10 at gmx.fr>. * Adds circumlunar css. * Removes the page-specific CSS. * Rewrites the documentation.	2022-08-23 10:46:41 +02:00
Christophe HENRY	12f823fe5a	Updates doc: can use other styles in the config	2022-08-02 23:31:11 +02:00
Christophe HENRY	86f07b2f2f	Updates the Lagrange style Thanks to Eric <ortie10 at gmx.fr>	2022-08-02 23:13:39 +02:00
Christophe HENRY	c49e214591	Merge branch 'master' into dev	2022-08-02 12:24:42 +02:00
Christophe HENRY	e479114cf0	v1.4.1 * Adds link to /htmgem on the icon of the menu * Fixes bug about CSS not applied correctly * Fixes a bug about null by ref variable	2022-08-02 11:43:00 +02:00
Christophe HENRY	ddac76e5fa	Adds link to /htmgem on the icon of the menu	2022-08-02 11:41:47 +02:00
Christophe HENRY	0daecbd624	Updates CHANGELOG	2022-08-02 08:43:36 +02:00
Christophe HENRY	5075e7395d	Adds base & circumlunar css, specification v0.16.1	2022-08-02 08:43:36 +02:00
Christophe HENRY	a79347d339	Adds the CSS of https://gemini.circumlunar.space	2022-08-02 08:43:36 +02:00
Christophe HENRY	de1e2c0664	Updates CHANGELOG	2022-08-02 08:43:36 +02:00
Christophe HENRY	8207df2b2e	Rewrites the documentation	2022-08-02 08:43:36 +02:00
Christophe HENRY	babf80b5ab	Fixes a margin bug in terminal.css	2022-08-02 08:43:36 +02:00
Christophe HENRY	a4879c929c	Fixes an error in include path for a test command	2022-08-02 08:43:36 +02:00
Christophe HENRY	9bb14c4e03	Fixes a bug in git post-commit test validation	2022-08-02 08:43:36 +02:00
Christophe HENRY	852e57d800	Removes the page-specific CSS	2022-08-02 08:43:36 +02:00
Christophe HENRY	ef149dcad8	Adds favicon and uses define() for a constant	2022-08-02 08:43:00 +02:00
Christophe HENRY	c36e9347a6	Adapts the Lagrange style * #entete comes back to .menu-line * the <hr> is removed by the style	2022-08-02 08:41:18 +02:00
Christophe HENRY	0834fd3bc9	Adds the Lagrange style Thanks to Eric <ortie10 at gmx.fr>	2022-08-02 08:41:18 +02:00
Christophe HENRY	678778ba2c	Make room for vendor CSS * Creates a default directory for the default css. * Changes the way to address CSS: "," used as "/" (see htmgem/css/index.gmi). * "src" style replaces the "pre": display the source code onscreen. * Removes the "None" style, useless after all. * Removes the absolute stylesheet path, now always in /htmgem/css.	2022-08-02 08:41:18 +02:00
Christophe HENRY	8e74403bc4	Fixes bug about CSS not applied correctly The default CSS was loaded anyway, in addition to the custom css.	2022-08-02 08:41:18 +02:00
Christophe HENRY	211518e098	Fixes a bug about null by ref variable	2022-08-02 08:41:18 +02:00
Christophe HENRY	d68b403f33	Fixes bug about CSS not applied correctly The default CSS was loaded anyway, in addition to the custom css.	2022-07-29 15:17:52 +02:00
Christophe HENRY	b5083aaabd	Fixes a bug about null by ref variable	2022-07-29 15:17:36 +02:00
Christophe HENRY	0079c31e14	Updates changelog	2022-07-26 00:03:21 +02:00
Christophe HENRY	1bc2e3b2bc	v1.4.0 * Adds the breadcrumbs at the top and the bottom of the page. * Adds the text icon H͜͡m. * Opens the external addresses in a new window/tab. * Changes details in the 404 page. * Manages UTF-8, UTF-16 and UTF-32 entry format. * FIX: adds alt text of preformated texts. * Enables to move and rename /htmgem. * Allows to always run without the URL rewriting. * Many code refactorings.	2021-04-12 09:53:00 +02:00
Christophe HENRY	76ed024bbd	Updates the Changelog	2021-04-11 22:32:58 +02:00
Christophe HENRY	c11af12551	Adds the url path * The /htmgem directory can be renamed and moved. * Everything works without URL rewriting enabled.	2021-04-11 21:37:50 +02:00
Christophe HENRY	a9fb49802a	FIX adds alt property to preformated text in HTML	2021-04-05 11:46:22 +02:00
Christophe HENRY	8cf174ecb3	Handles line feeds: Unix, Mac, Windows	2021-04-05 11:46:22 +02:00
Christophe HENRY	cf54d98c61	Converts files from UTF-16 and UTF-32 to UTF-8 The resulting file (HTML) is in UTF-8.	2021-04-05 11:46:22 +02:00
Christophe HENRY	6a8c220716	Opens external links in new windows	2021-04-03 20:40:59 +02:00
Christophe HENRY	9471816b1e	Removes useless statement	2021-04-02 23:37:49 +02:00
Christophe HENRY	26ad4f7d98	Adds strict_types=1	2021-04-02 21:29:46 +02:00
Christophe HENRY	f4e71cb2b6	Refactores and add tests for url resolving	2021-04-02 21:00:09 +02:00
Christophe HENRY	ba5e465e17	Adds lang="" to raise no error on HTML validation	2021-03-30 10:01:50 +02:00
Christophe HENRY	86f6ae918e	v1.3.0 * Enables browsing without URL Rewriting * Unit testing * Adds the BNF definition * Rewriting of the French documentation * Translation to English * Adds debug.css * Adds index.htm in case of Php not activated	2021-03-29 11:36:54 +02:00
Christophe HENRY	a4a647d8e8	Updates changelog	2021-03-29 11:32:19 +02:00
Christophe HENRY	1f38941e7f	Adds index.htm in case of Php not activated	2021-03-28 22:44:48 +02:00
Christophe HENRY	a24eb12df6	Ends BNF translation	2021-03-28 22:44:05 +02:00
Christophe HENRY	9471c43674	Enables browsing without URL Rewriting	2021-03-28 22:41:45 +02:00
Christophe HENRY	733450cb35	Translation to English	2021-03-27 23:31:47 +01:00
Christophe HENRY	98e7098f29	refactoring, adds debug.css	2021-03-27 21:30:19 +01:00
Christophe HENRY	76b27065ac	Refactores the French documentation	2021-03-25 23:47:12 +01:00
Christophe HENRY	731aa3bda2	Don't compress spaces for preformated text	2021-03-25 23:33:48 +01:00
Christophe HENRY	5d74578efc	typo	2021-03-25 22:39:52 +01:00
Christophe HENRY	2261dac656	FIX: translate_to_gemtext.php	2021-03-25 19:08:57 +01:00
Christophe HENRY	c858b733f4	Replaces several spaces to only one	2021-03-25 15:24:51 +01:00
Christophe HENRY	91abafdce0	FIX: each blockquote lines are in <p>	2021-03-25 12:22:39 +01:00
Christophe HENRY	c37549d50c	Updates text.gmi.html to validate the unit testing The file 'text.gmi.html' has to be changed for unit testing because the quotes have been escaped since then.	2021-03-24 11:22:23 +01:00
Christophe HENRY	7b58fc1a26	Merge remote-tracking branch master into dev	2021-03-24 11:18:57 +01:00
Christophe HENRY	64b5c1506a	Updates changelog	2021-03-24 11:09:37 +01:00
Christophe HENRY	750632eaff	Corrects the engine after unit testing gemTextParser * Removes the last empty lines of text files. * Removes the last spaces (rtrim) even on preformated texts. * Lines like "# " are possible, meaning an empty title. * The line "=>" means an empty link. GemtextTranslate_gemtext * FIX: writes the alt text of preformated blocks * Adds a space when quotation lines have value, otherwise lets ">". * Adds a space when the link has values, otherwise not. * A bit of reformating	2021-03-24 10:58:54 +01:00
Christophe HENRY	46d239b8b3	Adds unit tests	2021-03-24 10:53:12 +01:00
Christophe HENRY	4e42fed6b4	Adds null space between two slashes	2021-03-21 00:21:41 +01:00
Christophe HENRY	c1299a2bdb	FIX: adds line feed inside a blockquote Example: > line_1 > line_2 Before : line_1 line_2 After : Line_1 Line_2	2021-03-21 00:14:55 +01:00
Christophe HENRY	b7276e12e3	Removes unuseful case	2021-03-20 23:58:43 +01:00
Christophe HENRY	710bd1bf07	Corrects blockquote size on mobile	2021-03-20 22:41:04 +00:00
Christophe HENRY	731a8eef6a	Escapes the quotes and enable double encode	2021-03-20 15:21:43 +01:00
Christophe HENRY	0e508a7d57	Removes ob_* functions	2021-03-19 17:54:24 +01:00
Christophe HENRY	66720ed63f	Changes some items in CHANGELOG	2021-03-19 17:54:21 +01:00
Christophe HENRY	5df9d5ff15	FIX: textDecoration switch not updated	2021-03-19 17:49:21 +01:00
Christophe HENRY	cef2417f91	v1.2.0 * Removes "^" to disable text decoration line-wise. * CSS is no longer incorporated in the HTML page. * Perform sanity checks against unauthorized file access. * Properly close tags when the page exists in a non-null mode. * Split HTML generation in two: parsing and translating. * Create classes to handle gemtext parsing and translating. * Create class to generate back gemtext (for future test cases). * Fix: 404 doesn't occur for an empty file. * Page 404 fully generated by HtmGem itself.	2021-03-19 10:41:25 +01:00
Christophe HENRY	f29cf3a476	Updates changelog	2021-03-19 09:54:02 +01:00
Christophe HENRY	232cecc398	Enable security on what gmi file the client asks. * checks realPath() against no-existent files, * checks the file suffix '.gmi' * checks the directory belongs to that of the site.	2021-03-18 22:06:00 +01:00
Christophe HENRY	365c855c00	Adds icons in the 404 page	2021-03-18 22:05:48 +01:00
Christophe HENRY	d280ceff94	WIP	2021-03-18 22:04:31 +01:00
Christophe HENRY	859b0aad81	Deep refactoring: parsing, translating, classes * Removes "^" to disable text decoration line-wise. * Split HTML generation in two: parsing and translating. * Create class to handle gemtext parsing. * Create class to translate to HTML. * Create class to generate back gemtext (for future test cases). * Uses generators to parse then translate. * Fix: 404 doesn't occur for an empty file. * Page 404 fully generated by HtmGem itself. * CSS is no longer incorporated in the HTML page. * Handle CSS inclusion by addCss() calls.	2021-03-18 22:03:44 +01:00
Christophe HENRY	b2e09c54f5	FIX empty 404 and source file access An empty existing file triggered 404 error. It was possible to get the source of any file (including .php).	2021-03-18 15:55:49 +01:00
Christophe HENRY	087c2b5e6c	Changes the repository from Framasoft to Tildegit The framasoft Git service will soon disappear.	2021-03-17 22:53:12 +01:00
Christophe HENRY	9726203d07	Split index.php into two files to isolate the lib	2021-03-16 15:00:02 +01:00
Christophe HENRY	01efc79930	Roadmap for v1	2021-03-16 01:08:22 +01:00
Christophe HENRY	8710552d27	Removes margin of PRE	2021-03-14 10:48:25 +00:00
Christophe HENRY	d12192bb1f	v1.1.0 * File download when using "source" as a style. * Improves the regex. * Fixes 404 page text decoration, adds reload message. * Links to download htmgem-master.zip. * Links CHANGELOG and COPYING into index.gmi. * Styles improvement, creation of raw.css. * Rewording of texts -----BEGIN PGP SIGNATURE----- iQJQBAABCgA6FiEEzDX71ob8wkH7CWZFqiQlK53WhPAFAmBN4eQcHGNocmlzdG9w aGUuaGVucnlAc2Jnb2Rpbi5mcgAKCRCqJCUrndaE8DRsD/9q4LvoRI5Rw47UvS2J GfEtFBW11H1sFRmXkaxsqO3OIyv5gbAtHTCkxVFn8AbTLTjqiVhLXB/PF9a+tLRm PC0DWpFeKtCqi/pdBaIx1KRqSfcWF2EQNVW+V607nmv/Qv8mlxMsZboOVCihQpPR yAIaGu/tZgjBMeVoprDSECKS0CraFUQHasi9O4791MMPuQRM76nAr2zUvQPm1kj0 CM6x+zPAdACN/nIWbtjslB23hXHMxkBQpaLrmIDMv4Em1TVgU41d/80BbAf2kOve QVnjtATGtZZKWMIPppIxSvTBftAhFNyB2gYFt7V/E0tSXvNy9Rdec2ShjxO/5Fc/ c1DCJbllUZ41UNf2On1MNrJANq+EbOjAuPKO1hwEEOjLdzJQkLdfSD4nz0GDsc3j BFYidJ5jc5Kq3yWa6ysH4cDqCZirC78Q/jRMMJPaR9ylMWpxnDXTmS2mcot7esys iY70G8b04Xsy29PHyors7UtN8ezBTaTgL9VtBK7TnfGT6tSqlSnbrx+4K+DFf7ia PAXT/+dlp1QyYdysl0cuUU8oICYMnOsSSflBUQHrBHNMzhXUMQjkkF27aukzNbJ3 PU7SUFLukTctx2Wbe3Upe5wZG05toCgOYcOhBeFisFK00w82nBSP/zGEovaCMrYC UhUktfmU6mwKhnGtpMKYOYUZNQ== =WwgD -----END PGP SIGNATURE----- Merge tag 'v1.1.0' into dev v1.1.0 * File download when using "source" as a style. * Improves the regex. * Fixes 404 page text decoration, adds reload message. * Links to download htmgem-master.zip. * Links CHANGELOG and COPYING into index.gmi. * Styles improvement, creation of raw.css. * Rewording of texts	2021-03-14 11:44:52 +01:00
Christophe HENRY	97e1bd9ee5	Updates CHANGELOG for v2 and v3 items	2021-03-14 11:28:27 +01:00
Christophe HENRY	1e4c4f8f15	v1.1.0 * File download when using "source" as a style. * Improves the regex. * Fixes 404 page text decoration, adds reload message. * Links to download htmgem-master.zip. * Links CHANGELOG and COPYING into index.gmi. * Styles improvement, creation of raw.css. * Rewording of texts.	2021-03-14 11:10:38 +01:00
Christophe HENRY	b1ac194e59	Removes padding of pre	2021-03-13 20:09:56 +00:00
Christophe HENRY	d36f56c4ef	Lowers h2 size and removes background color of pre	2021-03-13 19:17:17 +01:00
Christophe HENRY	bf1c16922b	Fixes 404 page text decoration, adds reload message	2021-03-13 10:45:54 +00:00
Christophe HENRY	732fd9e15a	Adds idea in CHANGELOG	2021-03-13 10:49:32 +01:00
Christophe HENRY	46019a5187	Adds a warning about using styles with Gemini client	2021-03-13 00:16:04 +00:00
Christophe HENRY	212c503bfb	Forces download when using "source" as a style	2021-03-13 00:33:46 +01:00
Christophe HENRY	64ed0a56b3	Adds idea in CHANGELOG	2021-03-12 20:31:03 +01:00
Christophe HENRY	cdbc9614de	Adds link to download htmgem-master.zip	2021-03-12 20:25:22 +01:00
Christophe HENRY	ef4b3b8693	Adds idea in CHANGELOG	2021-03-12 00:57:25 +01:00
Christophe HENRY	c45101732b	Links CHANGELOG and COPYING into index.gmi & edits	2021-03-12 00:48:57 +01:00
Christophe HENRY	9b357ce3d8	Removes unuseful .gitignore	2021-03-12 00:02:16 +01:00
Christophe HENRY	9fb9206b7a	Adds idea about CSS file setting	2021-03-11 23:55:41 +01:00
Christophe HENRY	c040ba14a3	Removes ^ at the beginning of lines in tutogemtext	2021-03-11 21:01:18 +01:00
Christophe HENRY	ba1c51259c	Adds Unreleased section to CHANGELOG.gmi	2021-03-11 18:11:56 +01:00
Christophe HENRY	5d791a16d6	Renames CHANGELOG.md to .gmi and adapts content	2021-03-11 18:06:19 +01:00
Christophe HENRY	d3cd419b5a	Rewords the tutorial	2021-03-11 11:56:49 +01:00
Christophe HENRY	c218d37728	Improves the text a little bit	2021-03-11 11:07:01 +01:00
Christophe HENRY	f5b1a30a2c	Enable spaces around \| to set a style	2021-03-11 11:04:27 +01:00
Christophe HENRY	a380fadde9	Rewording about text formatting / decoration	2021-03-11 10:59:58 +01:00
Christophe HENRY	9ee5a8744a	Improves the regex	2021-03-11 10:55:49 +01:00
Christophe HENRY	b418c6080f	fix missing \ in regex	2021-03-11 09:19:13 +01:00
Christophe HENRY	aaa12c0ee0	Adds an index selection page	2021-03-11 00:28:53 +01:00
Christophe HENRY	26c660f514	Removes unuseful part of black_wide.css	2021-03-11 00:21:31 +01:00
Christophe HENRY	a2a27498f8	Adds style raw.css	2021-03-11 00:20:56 +01:00
Christophe HENRY	5832532582	Changes a bit of wording	2021-03-10 13:26:29 +00:00
Christophe HENRY	814dba4078	Ready for v1.0.0	2021-03-10 11:49:53 +01:00
Christophe HENRY	9b4f33f42f	Improves general presentation	2021-03-10 11:28:52 +01:00
Christophe HENRY	db12ae5856	Improves the presentation of HtmGem	2021-03-10 10:47:40 +01:00
Christophe HENRY	799b34c536	Modify the simple style	2021-03-09 20:07:28 +00:00
Christophe HENRY	8f06bc7e01	Corrects CSS terminal on mobile	2021-03-09 19:59:08 +00:00
Christophe HENRY	b6be4afb97	Updates the TODO	2021-03-09 20:34:11 +01:00
Christophe HENRY	7222306779	Don't output a last empty line	2021-03-09 20:27:42 +01:00
Christophe HENRY	29e7d6fc14	Améliore le tutoriel d’installation	2021-03-09 20:20:01 +01:00
Christophe HENRY	b162501f0c	Permet de désactiver la décoration du texte	2021-03-09 20:19:00 +01:00
Christophe HENRY	805522cdd5	Adds black wide CSS	2021-03-09 15:48:10 +01:00
Christophe HENRY	67890863c8	Adds a greeny terminal CSS	2021-03-09 15:40:30 +01:00
Christophe HENRY	3032d7e9af	Explains about testing style on the URL itself	2021-03-09 15:23:58 +01:00
Christophe HENRY	0f6e3467eb	Adds the style "pre"	2021-03-09 11:11:28 +00:00
Christophe HENRY	1a01e8bd81	Manages the styles. WIP.	2021-03-09 11:26:55 +01:00
Christophe HENRY	11ced8627b	Shows nothing if index.gmi is not present	2021-03-09 02:11:29 +01:00
Christophe HENRY	944dce364a	Improves the installation instructions	2021-03-09 01:51:45 +01:00
Christophe HENRY	e74beaba67	Adds 404 page	2021-03-09 01:48:10 +01:00
Christophe HENRY	dcc36b1d66	Setup easy installation on shared host	2021-03-09 00:48:07 +01:00
Christophe HENRY	545ac4963d	Adds emojis according to links type	2021-03-08 16:59:17 +01:00
Christophe HENRY	18d5b7d518	Adds alt/title attribute to preformatted text	2021-03-08 15:45:45 +01:00
Christophe HENRY	7f19c7e963	WIP	2021-03-08 10:45:51 +01:00
Christophe HENRY	ac0de467ab	Improves processing of ^^^ and ^	2021-03-08 10:41:59 +01:00
Christophe HENRY	6a37ecebcf	Passes on next line in case of error, and log error When there are too many loops, indicating that something went wrong, sets the loop count to zero, the mode to default and go to the next line.	2021-03-08 10:14:57 +01:00
Christophe HENRY	d5f569baa9	Manages more dashes	2021-03-08 10:06:07 +01:00
Christophe HENRY	47182904db	Improvement of CSS	2021-03-08 00:13:04 +01:00
Christophe HENRY	c305faf8e4	Updates TODO	2021-03-06 16:40:41 +01:00
Christophe HENRY	f64db8c7ca	Manages non-break spaces for em-dash and en-dash	2021-03-06 15:36:56 +00:00
Christophe HENRY	c66bb76b00	WIP	2021-03-06 00:34:16 +01:00
Christophe HENRY	bf7a2ec379	wip	2021-03-05 22:38:43 +01:00
Christophe HENRY	4de4f45b84	Prevent <A> from text-justify	2021-03-05 15:57:31 +01:00
Christophe HENRY	376c4ed977	WIP	2021-03-05 15:55:57 +01:00
Christophe HENRY	f4500fbc4f	Sets mb encoding	2021-03-05 09:32:54 +01:00
Christophe HENRY	ab5bba5ccd	WIP	2021-03-04 21:59:57 +01:00
Christophe HENRY	e723f268fc	Replaces preg_replace by mb_ereg_replace * No more use of separators. * \$1 becomes \\1	2021-03-04 21:22:41 +01:00
Christophe HENRY	011f38c36f	WIP	2021-03-04 20:39:54 +01:00
Christophe HENRY	fcfa98f5d6	WIP	2021-03-04 14:17:02 +01:00
Christophe HENRY	1900f3c03b	Prevents attributes, HTML escape applied on links When the link come without label: => gemini://thesite.gmi The A markup would be: <a href="gemini://thesite.gmi">gemini:<i>thesite.gmi</a></i>	2021-03-04 14:14:26 +01:00
Christophe HENRY	28ddc04ba2	Updates CSS	2021-03-04 11:24:50 +01:00
Christophe HENRY	307d31d1f5	WIP	2021-03-03 23:37:34 +01:00
Christophe HENRY	3af022aee9	Merge branch 'master' into dev	2021-03-03 15:45:44 +00:00
Christophe HENRY	04a7fe7484	Text markups work	2021-03-03 16:40:22 +01:00
Christophe HENRY	96c1303663	WIP	2021-03-03 16:07:56 +01:00
Christophe HENRY	b2d5ecf990	WIP	2021-03-02 23:35:53 +01:00
Christophe HENRY	affe677035	Removes local path on 404	2021-03-02 22:05:30 +00:00
Christophe HENRY	1da3f6520b	Fixes a bug with unordered list	2021-03-02 22:48:06 +01:00
Christophe HENRY	a359f16d06	Updates TODO	2021-03-02 20:51:59 +01:00
Christophe HENRY	2032f6037e	Improves the CSS	2021-03-01 20:30:09 +00:00
Christophe HENRY	db5fd02737	Updates TODO	2021-03-01 20:36:53 +01:00