Initial code commit

This commit is contained in:
Robert Miles 2020-11-15 03:38:24 +00:00
commit 469335d73a
10 changed files with 555 additions and 0 deletions

148
.gitignore vendored Normal file
View File

@ -0,0 +1,148 @@
# Created by https://www.toptal.com/developers/gitignore/api/python
# Edit at https://www.toptal.com/developers/gitignore?templates=python
### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
pytestdebug.log
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
doc/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
pythonenv*
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# profiling data
.prof
# End of https://www.toptal.com/developers/gitignore/api/python
# orbit.json file (contains the URLs in the orbit)
orbit.json

44
README.md Normal file
View File

@ -0,0 +1,44 @@
# molniya - Gemini orbit software
An orbit is like a webring but for gemini. This code was originally written to service LEO[1], the first (to my knowledge) orbit ever made.
A Molniya orbit is a type of satellite orbit designed to provide communications and remote sensing coverage over high latitudes (thanks Wikipedia[2]). It was invented by the Russians for their spy satellites. I feel like it's a fitting name.
[1]: gemini://tilde.team/~khuxkm/leo/
[2]: https://en.wikipedia.org/wiki/Molniya_orbit
## How to use Molniya
In order to use Molniya, do the following:
1. Clone this repository into a place on your Gemini root directory. (If you're using a userdir for this, that's fine too.)
2. Modify `config.py`. Namely, update `MAIN_PAGE` to be the link to the index.gmi file, and update `REQUIRED_LINKS` to be the links to `next.cgi`, `prev.cgi`, and `rand.cgi`, wherever they are accessible via Gemini. `BACKLINKS` and `determine_capsule` shouldn't need to be changed (unless GUS is down or you want to change how Zenit decides what makes up a capsule.
3. Modify `index.gmi` to link to your files instead of mine.
4. Get people to link to your main page and one of `next.cgi`, `prev.cgi`, and `rand.cgi`.
5. Set a cronjob to run Zenit every now and again. (If I knew how often GUS indexed, I'd give a specific frequency, but I don't, so I won't.)
## What is the orbit.json file?
The `orbit.json` file contains all of the URLs in the orbit. It's how Molniya and Zenit know what URLs are there, and in what order.
## You keep mentioning Zenit. What is it?
Zenit is the Molniya indexer. It uses GUS's backlinks feature to get a list of pages that link to the orbit, and then checks them for having a link to allow people to continue to traverse the webring.
## Can you add links to orbit.json manually?
Of course you can! Just make sure to keep `orbit.json` valid JSON, or it'll break the Molniya library and the entire ring will break.
## Why does Molniya want the next and previous links to contain the URL of the page as a query?
Because Gemini lacks referrers (a good choice), there's no way to tell where a client came from just by studying the request. As such, it needs some way of indicating where in the orbit the client is.
`rand.cgi` lacks this requirement because it just gets a random page anyways. That being said, it's best if the client also puts their URL in the `rand.cgi` link, just so that users selecting the random link option aren't sent back to the very site they just came from.
## Why does Molniya abuse redirects to send the user on to the next page? Why not have a landing page?
Because I don't feel like implementing a landing page to just have the user click off of it. The redirect *should* replace the orbit link in the client's history, so the user shouldn't be adversely affected by this.

31
config.py Normal file
View File

@ -0,0 +1,31 @@
# ------------------------------------------------------------------------------
# The configuration for the Molniya orbit software.
# Implemented as a Python library to allow for determine_capsule logic.
# ------------------------------------------------------------------------------
# The main page of the orbit.
# Is used to seed the orbit when it's first created, as well as to find new pages to include in the orbit.
MAIN_PAGE = "gemini://tilde.team/~khuxkm/leo/"
# The backlinks base page.
# Takes a URL as a query and returns a list of pages that backlink to it.
# In case GUS ever goes down, is replaced, or changes URL, this will allow for easy fixing of Zenit.
BACKLINKS = "gemini://gus.guru/backlinks"
# Determine a "capsule".
# Used to prevent one person from flooding the orbit with pages.
# One page is allowed in the list per "capsule".
# This function is passed a urllib.parse.ParseResult object, and returns a string that identifies which capsule the URL belongs to.
def determine_capsule(parsed):
capsule = parsed.netloc
if parsed.path.startswith("/~"): # allow for each userdir to be its own capsule
capsule+="/"+parsed.path.split("/")[1]
return capsule
# Required links.
# A site must have one of these links to be included in the orbit.
REQUIRED_LINKS = [
"gemini://tilde.team/~khuxkm/leo/next.cgi",
"gemini://tilde.team/~khuxkm/leo/prev.cgi",
"gemini://tilde.team/~khuxkm/leo/rand.cgi"
]

51
gemini.py Normal file
View File

@ -0,0 +1,51 @@
import os,subprocess
from urllib.parse import urljoin
def start_response(meta,code="20"):
print(f"{code} {meta}",end="\r\n")
def get_input():
return os.environ.get("QUERY_STRING","")
# utility func
def has_input():
return bool(get_input())
def link(text,url):
print(f"=> {url} {text}")
def header(text,level=1):
print("#"*level + " " + text)
# named for convenience; in practice just use header func with different levels
def h1(text):
header(text,1)
def h2(text):
header(text,2)
def h3(text):
header(text,3)
def text(text=""):
print(text)
def figlet(tex,**opts):
if len(opts.keys())==0:
opts = {"f":"standard"}
ct = ["/usr/bin/figlet"]
for k in opts:
ct.append("-"+k)
ct.append(opts[k])
ct.append(tex)
text("```")
command_out(ct)
text("```")
def command_lines(ct):
for l in subprocess.check_output(ct).decode("ascii").split("\n"):
yield l
def command_out(ct,f=text):
for l in command_lines(ct):
f(l)

33
index.gmi Normal file
View File

@ -0,0 +1,33 @@
#LEO
LEO is an orbit, a term used here to mean "a webring but for Gemini instead of the web". The name is both a play on Low Earth Orbit, as well as the constellation Leo, the Lion.
##How to join LEO
To join LEO, here's what you need to do:
1. Link to `gemini://tilde.team/~khuxkm/leo/` on your page. This is what will get the indexer to notice your page.
2. Make sure your page is indexed by GUS (gemini://gus.guru/). GUS's backlink page is what drives the indexer.
3. Also include a link to one of the following:
```
Next Page - gemini://tilde.team/~khuxkm/leo/next.cgi?<your url>
Last Page - gemini://tilde.team/~khuxkm/leo/prev.cgi?<your url>
Random Page - gemini://tilde.team/~khuxkm/leo/rand.cgi
```
If your page links to this page and at least one (1) of the links above, your page will be added to the orbit the next time the indexer runs.
Also, only one page can be added to the orbit per capsule (capsule being defined as the host:port, as well as optional userdir). This is to prevent any one person from flooding LEO with their own pages.
##How do I start my own orbit?
To start your own orbit, download Molniya at the link below. Place it in a gemini root, and modify config.py as directed in Molniya's README.
=> https://tildegit.org/khuxkm/molniya Molniya (will take you to HTTP land, tildegit.org)
##Can I go to the next page now?
Yep!
=> next.cgi Beam me up, Scotty!

41
molniya.py Normal file
View File

@ -0,0 +1,41 @@
"""The utility library for the Molniya CGI pages.
Handles loading the orbit, as well as next/previous/random links."""
import json, random, os, urllib.parse
# stolen from AV-98
urllib.parse.uses_relative.append("gemini")
urllib.parse.uses_netloc.append("gemini")
# The URL of the main page of the orbit
MAIN_PAGE = "gemini://tilde.team/~khuxkm/leo/"
URLS = [MAIN_PAGE]
try:
with open("orbit.json") as f:
URLS = json.load(f)["urls"]
except:
pass
CURRENT_URL = urllib.parse.unquote(os.environ.get("QUERY_STRING",MAIN_PAGE))
CURRENT_URL_PARSED = urllib.parse.urlparse(CURRENT_URL)
try:
CURRENT_URL_INDEX = URLS.index(CURRENT_URL)
except:
CURRENT_URL = MAIN_PAGE
try:
CURRENT_URL_INDEX = URLS.index(CURRENT_URL)
except:
# give up and just start at 0
CURRENT_URL_INDEX = 0
def next_url():
return URLS[(CURRENT_URL_INDEX+1)%len(URLS)]
def prev_url():
return URLS[(CURRENT_URL_INDEX-1)%len(URLS)]
def rand_url():
ret = random.choice(URLS)
while ret==CURRENT_URL or ret==MAIN_PAGE:
ret = random.choice(URLS)
return ret

4
next.cgi Executable file
View File

@ -0,0 +1,4 @@
#!/usr/bin/env python3
import molniya, gemini
gemini.start_response(molniya.next_url(),"30")

4
prev.cgi Executable file
View File

@ -0,0 +1,4 @@
#!/usr/bin/env python3
import molniya, gemini
gemini.start_response(molniya.next_url(),"30")

4
rand.cgi Executable file
View File

@ -0,0 +1,4 @@
#!/usr/bin/env python3
import molniya, gemini
gemini.start_response(molniya.next_url(),"30")

195
zenit.py Normal file
View File

@ -0,0 +1,195 @@
"""Zenit - the Molniya indexer.
Zenit was a series of military photoreconnaissance satellites launched by the Soviet Union between 1961 and 1994. In keeping with the Soviet spy satellite theme, I chose this name for the indexer."""
import json, urllib.parse, traceback, sys, ssl, socket, string
from config import *
# stolen from AV-98
urllib.parse.uses_relative.append("gemini")
urllib.parse.uses_netloc.append("gemini")
# Load URL list
URLS = [MAIN_PAGE]
try:
with open("orbit.json") as f:
URLS = json.load(f)["urls"]
except IOError as e: # we can be a bit more outgoing about our errors here
print(f"Error loading orbit.json: {e!r}")
print("Continuing on anyways with a list containing only the URL of the main page.")
except KeyError as e:
print("Malformed orbit.json: no urls list")
print("Continuing on anyways with a list containing only the URL of the main page.")
except:
print("Error loading orbit.json (not IOError or KeyError):")
traceback.print_exc()
print("Exiting.")
sys.exit(1)
# Utility function to parse a MIME type
def parse_mime(mimetype):
mimetype = mimetype.strip()
index = 0
type = ""
# type is everything before the /
while index<len(mimetype) and mimetype[index]!="/":
type+=mimetype[index]
index+=1
index+=1
subtype = ""
# subtype is everything after the slash and before the semicolon (if the latter exists)
while index<len(mimetype) and mimetype[index]!=";":
subtype+=mimetype[index]
index+=1
index+=1
# if there's no semicolon, there are no params
if index>=len(mimetype): return [type,subtype], dict()
params = dict()
while index<len(mimetype):
# skip whitespace
while index<len(mimetype) and mimetype[index] in string.whitespace:
index+=1
paramName = ""
# the parameter name is everything before the = or ;
while index<len(mimetype) and mimetype[index] not in "=;":
paramName+=mimetype[index]
index+=1
# if the string is over or there isn't an equals sign, there's no param value
if index>=len(mimetype) or mimetype[index]==";":
index+=1
params[paramName]=None
continue
# otherwise, grab the param value
index+=1
paramValue = ""
if mimetype[index]=='"':
index+=1
while True:
while index<len(mimetype) and mimetype[index] not in '\\"':
paramValue+=mimetype[index]
index+=1
if index>=len(mimetype): break
c = mimetype[index]
index+=1
if c=="\\":
if index>=len(mimetype):
paramValue+=c
break
paramValue+=mimetype[index]
index+=1
else:
break
# skip until next ;
while index<len(mimetype) and mimetype[index]!=";": index+=1
else:
while index<len(mimetype) and mimetype[index]!=";":
paramValue+=mimetype[index]
index+=1
if paramName: params[paramName]=paramValue
return [type, subtype], params
# Utility function to grab content from a URL
# Context setup courtesy of my own half-baked spartan client
def grab_content(url,redirect_num=0):
if redirect_num>=5:
return "Too many redirects!","text/plain"
parsed = urllib.parse.urlparse(url)
if "ctx" not in globals():
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
globals()["ctx"]=ctx
else:
ctx = globals()["ctx"]
with socket.socket(socket.AF_INET,socket.SOCK_STREAM) as s:
ss = ctx.wrap_socket(s,server_hostname=parsed.hostname)
ss.connect((parsed.hostname,parsed.port or 1965))
ss.send((url.strip()+"\r\n").encode("UTF-8"))
out = b""
while (data:=ss.recv(2048)):
out+=data
header, content = out.split(b"\r\n",1)
status, meta = header.decode("utf-8").split(None,1)
assert len(meta)<1024
if status[0]=="2":
types, params = parse_mime(meta)
if types[0]=="text":
# assume UTF-8
charset = "utf-8"
# ...but if another charset is given accept it
if "charset" in params:
charset = params["charset"]
# decode and return
return content.decode(charset), meta
else:
# if it's not a text result, just return the content
return content, meta
elif status[0]=="3":
# if it's a redirect, then let's follow it
return grab_content(meta,redirect_num+1)
else:
# Either:
# 1x - it wants an input, which we have no agency to give
# 6x - it wants a client cert, which we have no agency to give
# 4x or 5x - there's an error
# Return the header with a mimetype of text/plain. If this were a real library I might throw an error here, but this is just to make Zenit work.
return header.decode("utf-8"), "text/plain"
CAPSULES_IN_ORBIT = set(determine_capsule(urllib.parse.urlparse(url)) for url in URLS)
modified_orbit = False
backlinks_url = "?".join([BACKLINKS, urllib.parse.quote(MAIN_PAGE)])
response, mime = grab_content(backlinks_url)
# Backlinks page should return a text/gemini doc
assert mime.startswith("text/gemini"),f"Backlinks URL returned a response that wasn't text/gemini! ({response},{mime})"
links = []
stage = 1
for line in response.splitlines():
if stage==1:
if line.startswith("=>"):
stage=2
if stage==2:
if line.startswith("=>"):
parts = line.split(None,2)
links.append(parts[1])
else:
stage=3
break
# stage 3 is to ignore the rest of the lines.
# if we're in stage 3 and somehow miss breaking out of the loop, nothing will happen
# filter out just the new links
links = [link for link in links if link not in URLS]
print("Found {} new link{} to index{}".format(len(links),"" if len(links)==1 else "s","..." if len(links)>0 else "."))
for link in links:
# Things to consider for a new link:
# Does its capsule already have representation in the orbit?
capsule = determine_capsule(urllib.parse.urlparse(link))
if capsule in CAPSULES_IN_ORBIT:
# skip
print(f"Skipping {link} (capsule already in orbit)...")
continue
# Does it link to any of the required links?
response, mime = grab_content(link)
try:
assert mime.startswith("text/gemini"), f"{mime} response isn't text/gemini and therefore can't link back"
links_to_orbit = False
for line in response.splitlines():
if line.startswith("=>"):
parts = line.split(None,2)
for reqlink in REQUIRED_LINKS:
links_to_orbit=links_to_orbit or parts[1].startswith(reqlink)
assert links_to_orbit, "doesn't link back to orbit"
except AssertionError as e:
print(f"Skipping {link} ({e.args[0]})...")
continue
# If we haven't continue'd by now, the link meets all of the criteria
print(f"Adding {link} to the orbit...")
URLS.append(link)
modified_orbit = True
if modified_orbit:
print("Saving modified orbit...")
with open("orbit.json","w") as f:
json.dump(dict(urls=URLS),f)