A new eBook format based on Gemini Protocol's Gemtext. Gempub can also serve as a Gemini capsule archive format.
Go to file
Öppen 54727b3ee3 fix filename typo 2021-05-03 10:39:18 +01:00
images add screenshots 2021-05-02 22:58:49 +01:00
reference_gpub fix filename typo 2021-05-03 10:39:18 +01:00
GEMPUB.png Upload files to '' 2021-04-27 15:02:32 +02:00
GEMPUB.svg Upload files to '' 2021-04-27 14:55:51 +02:00
README.md inline lagrange image 2021-05-02 23:02:39 +01:00

README.md

GEMPUB

1 Overview
     1.1 Rationale
     1.2 Goals
     1.3 Non-Goals
2 Format
     2.1 File Extenstion and MIME
     2.2 Directory Structure
     2.3 Metadata
     2.4 Content
     2.5 Images
3 Accessibility
4 Recommendations
     4.1 Charsets
     4.2 External Links
     4.3 Unfamilar File Formats
5 Tools

A Gempub book displayed in Lagrange Image: A Gempub book opened in Lagrange

1 Overview

The application/gpub+zip media type ("gempub" or "GPUB") is a proposed new e-book file format that uses the ".gpub" file extension.

It is primarily intended to serve as a container for e-books containing text files of in the "text/gemini" format, allowing the author to avoid the complexity of using web technologies for e-books that do not require them. It has a secondary purpose of functioning as an archival format for Gemini capsules.

Questions, comments, help: oppen@fastmail.com

1.1 Rationale

While implementing an .epub reader, it became apparent that it is practically impossible to separate an ebook's data from its presentation. EPUB archives are zipped HTML, CSS, and metadata, and even Google with its infinite resources is unable to render pages correctly for all titles in Play Books, where mangled and unusable index pages are common. It's a lot of work and effort to attempt to convert HTML markup into another format that can be rendered natively.

1.2 Goals

Simplicity. Gempub follows the same original aims as the Gemini Protocol.

  • It should be possible for somebody who had no part in designing the protocol to accurately hold the entire protocol spec in their head after reading a well-written description of it once or twice.
  • A basic but usable (not ultra-spartan) client should fit comfortably within 50 or so lines of code in a modern high-level language. Certainly not more than 100. (In the case of gempub, the lines-of-code target may be ambitious but the spirit is the same.)
  • A client comfortable for daily use which implements every single protocol feature should be a feasible weekend programming project for a single developer.

1.3 Non-Goals

There are lots of use-cases where Gempub isn't appropriate. For example, it is not intended for complex layouts or scientific notation. There are other formats that serve those use-cases better.

2 Format

2.1 File Extension and MIME

Gempub files end with the extension ".gpub" and their mime-type is application/gpub+zip

2.2 Directory Structure

Gempub files are zipped directories of Gemtext ".gmi" files plus an optional metadata file:

• metadata.txt - a file containing the title, author and any other optional fields. See "Metadata", below.

This file enables Gempub to act as a full eBook format. Gemini capsules can also be simply zip compressed without the metadata file to act as a Gemini archive/offline format - when operating as an archive there must be an index.gmi in the root directory.

Example:

//Example with index.gmi in a sub-directory, specified by the index value in metadata.txt

//.gpub contents:
book_title.gpub/
   metadata.txt
   cover.jpg
   book/
      index.gmi
      chapter1.gmi
      chapter2.gmi
      chapter3.gmi
      images/
         illustration.png

//metadata.txt:
title: book title
gpubVersion: 1.0.0
cover: cover.jpg
index: book/index.gmi

2.3 Metadata

The metadata.txt file contains key-value pairs separated by line. Values start after the first colon and are trimmed (e.g., author: Olaf Stapledon and author:Olaf Stapledon are equivalent). All values are optional apart from title and gpubVersion. Order does not matter. If no index path is specified there must be an index.gmi in the directory root.

  • title - a mandatory title of the work
  • gpubVersion - mandatory Gempub format version: 1.0.0
  • index - path to start index.gmi
  • author
  • language - BCP 47
  • charset - Default is UTF-8, see below for other charsets
  • description
  • published - Format YYYY for when precise date is unknown
  • publishDate - Format: YYYY-MM-DD eg. 1981-02-01
  • revisionDate - Format: YYYY-MM-DD
  • copyright
  • license
  • version - human readable only, not meant to be parsed
  • cover - a JPG or PNG image which can be anywhere in the directory structure. For accessibility, and also because clipping will occur when maintaining the aspect ratio of the image - do NOT use text in the image.

This metadata is intended so readers can display a useful catalogue of multiple .gpub files and display a cover for individual books. Metadata must never be used to specify flags for content rendering. Content should always be simple Gemtext. Reader applications must ignore custom parameters.

Example:

title: Star Maker
author: Olaf Stapledon
index: ./capsule/index.gmi
gpubVersion: 1.0.0

2.4 Content

All content must follow the gemtext specification.

Reader implementations should use the index.gmi to determine what to display next when the user reaches the end of a chapter.

2.5 Images

The Gemini Protocol doesn't allow auto-loading of images for various reasons, none of which are applicable in an eBook. Gempub implementations can choose to handle images:

  • Inline: any links that end in an image extension can be automatically inlined, retaining aspect-ratio based on available screen width.
  • Linked: for implementation simplicity a clicked image link could take the user to a separate in-app image viewer (or even pass to the OS to display).

Supported formats are PNG and JPEG as they're common and included on most/all platforms.

Images must always include a description for accessibility:

//Invalid Gempub image syntax:
=> ./header.jpg

//Correct image syntax:
=> ./header.jpg A man floating through space

3 Accessibility

As well as including appropriate alt-text for images make sure screen readers are able to correctly interpret Gemtext. Gemtext has syntax for 3 different header types which may handled differently by screen readers. E.g., on Android, a text view may have the accessibility heading attribute setAccessibilityHeading(boolean).

Images should never include text unless it's repeated as text content immediately above or below.

4 Recommendations

4.1 Charsets

From the Gemini Protocol specification:

If a MIME type begins with "text/" and no charset is explicitly given, the charset should be assumed to be UTF-8. Compliant clients MUST support UTF-8-encoded text/* responses. Clients MAY optionally support other encodings. Clients receiving a response in a charset they cannot decode SHOULD gracefully inform the user what happened instead of displaying garbage.

Gempub readers should use a similar approach.

Readers should handle external links for both https:// and gemini:// by passing the URI to another application or the OS to render. Only local media files (i.e., files contained within the gempub archive) should be handled in the reader. The reader must not inline remote images. URLs are ephemeral, but linking to external capsules and websites would be useful for zines and articles. Novels or stories should obviously never do this.

4.3 Unfamiliar File Formats

Readers should expect to encounter unfamiliar file formats bundled in the .gpub file. Links to this content should be displayed, as the surrounding text might not make sense without the link text in place. Simple readers could just display a label with the filename and an 'unrecognised filetype:' prefix, whereas more advanced readers might pass the file to the OS to handle. The reader application must never omit the link text entirely if the filetype can't be handled.

5 Tools

  • todo: write ePub to gPub converter
  • todo: gPub validator (check image alt text, check all links are local/relative, check metadata)
  • Capsule Scraper to gPub Archive: cget - in progress Gemini capsule scraper