protocol_spec/message_format.md

213 lines
7.7 KiB
Markdown
Raw Permalink Normal View History

2020-05-05 12:42:04 +00:00
# Exploring the Pigeon Message Format
2020-05-01 14:21:57 +00:00
2020-05-05 12:42:04 +00:00
In the test that follows, we will explore a pigeon message line-by-line.
2020-05-01 14:21:57 +00:00
2020-05-05 12:42:04 +00:00
The example message is shown in its entirety below:
2020-05-01 14:21:57 +00:00
```
author @MF312A76JV8S1XWCHV1XR6ANRDMPAT2G5K8PZTGKWV354PR82CD0.ed25519
kind weather_report
prev %ZV85NQS8B1BWQN7YAME1GB0G6XS2AVN610RQTME507DN5ASP2S6G.sha256
depth 3
lipmaa 2
temperature:"22.0C"
webcam_photo:&FV0FJ0YZADY7C5JTTFYPKDBHTZJ5JVVP5TCKP0605WWXYJG4VMRG.sha256
weather_reported_by:@0DC253VW8RP4KGTZP8K5G2TAPMDRNA6RX1VHCWX1S8VJ67A213FM.ed25519
signature JSPJJQJRVBVGV52K2058AR2KFQCWSZ8M8W6Q6PB93R2T3SJ031AYX1X74KCW06HHVQ9Y6NDATGE6NH3W59QY35M58YDQC5WEA1ASW08.sig.ed25519
```
2020-05-05 12:42:04 +00:00
### Line 1: `Author`
2020-05-04 13:44:51 +00:00
EXAMPLE:
```
author @MF312A76JV8S1XWCHV1XR6ANRDMPAT2G5K8PZTGKWV354PR82CD0.ed25519
```
The first line of a Pigeon message header is the `author` entry.
Every Pigeon database has an "identity". An identity is an ED25519 key pair that prevents tampering by parties other than the database owner. An identity is publicly referenced using a "multihash". In the example above, the identity multihash was `@MF312A76JV8S1XWCHV1XR6ANRDMPAT2G5K8PZTGKWV354PR82CD0.ed25519`.
The steps to generate a valid identity are:
1. Perform [Crockford Base32 encoding](https://www.crockford.com/base32.html) on an ED25519 public key.
2. Add an `@` symbol to the beginning of the string from step 1.
3. Add a `.ed25519` string to the end of the string from step 2.
2020-05-05 12:42:04 +00:00
### Line 2: `Kind`
2020-05-04 13:44:51 +00:00
EXAMPLE:
```
kind weather_report
```
The second line of the header is the `kind` entry. This entry is user definable. The `kind` entry is used as a means of signalling intent to applications that will consume the message.
It must meet the following criteria:
* Must be 1-90 characters in length
* Cannot contain whitespace or control characters
* May contain any of the following characters:
* alphanumeric characters
* dashes (`-`)
* underscores (`_`)
* Symbols used for multihashes, such as `@`, `&` and `%` (covered later).
2020-05-05 12:42:04 +00:00
### Line 3: `Prev`
2020-05-04 13:44:51 +00:00
EXAMPLE:
```
prev %ZV85NQS8B1BWQN7YAME1GB0G6XS2AVN610RQTME507DN5ASP2S6G.sha256
```
A Pigeon message feed is a unidirectional chain of documents where the newest document points back to the document that came before it in the chain ([example diagram](diagram1.png)).
To create this chain, a Pigeon message uses the `prev` field. The `prev` field contains a message multihash. In this case, the multihash is `%ZV85NQS8B1BWQN7YAME1GB0G6XS2AVN610RQTME507DN5ASP2S6G.sha256`.
Messages are content addressed. This is in contrast to protocols such as HTTP which use names to identify resources. Because Pigeon messages are addressed by content rather than by name, changing a message's content, even by just one character, has the effect of completely changing the message's multihash.
**For the first message of a feed, this value is set to `NONE`.**
Message multihashes are calculated as follows:
1. The first character is a `%` symbol, indicating that it is a `message` rather than an `identity`, `blob` or `string`.
2. The next 52 characters are a [Crockford base 32](https://www.crockford.com/base32.html) SHA512 hash of the previous message's content.
3. The message multihash ends in `.sha512`.
2020-05-05 12:42:04 +00:00
### Line 4: `Depth`
2020-05-04 13:44:51 +00:00
EXAMPLE:
```
depth 3
```
Pigeon messages exist in a linear sequence which only moves forward and never "forks".
Every message has a `depth` field to indicate its "place in line".
Because every message has an ever-increasing integer that never duplicates, every message in a Pigeon feed will have a unique hash. This is true even if messages have identical body content.
2020-05-05 12:42:04 +00:00
### Line 5: `Lipmaa`
2020-05-04 13:44:51 +00:00
**THIS FIELD WAS WRITTEN INCORRECTLY. THIS WILL CHANGE SOON. YOU CAN SAFELY MOVE TO THE NEXT SECTION OF THE DOCS**
This concept was borrowed from the [Bamboo protocol](https://github.com/AljoschaMeyer/bamboo#links-and-entry-verification) and [Helger Lipmaa's thesis](https://kodu.ut.ee/~lipmaa/papers/thesis/thesis.pdf).
2020-05-05 12:42:04 +00:00
The `lipmaa` field (often called a "Lipmaa Link") is a special kind of `prev` field that allows partial verification of feeds. This field makes it possible to verify a single message (or subset of messages) without downloading the entire chain of messages.
2020-05-04 13:44:51 +00:00
2020-05-05 12:42:04 +00:00
![](lipmaa.png)
2020-05-04 13:44:51 +00:00
The `lipmaa` field is calculated as follows:
```ruby
def lipmaa(n)
# The original lipmaa function returns -1 for 0
# but that does not mesh well with our serialization
# scheme. Comments welcome on this one.
return 0 if n < 1 # Prevent -1, division by zero etc..
m, po3, x = 1, 3, n
# find k such that (3^k - 1)/2 >= n
while (m < n)
po3 *= 3
m = (po3 - 1) / 2
end
po3 /= 3
# find longest possible back-jump
if (m != n)
while x != 0
m = (po3 - 1) / 2
po3 /= 3
x %= m
end
if (m != po3)
po3 = m
end
end
return n - po3
end
```
2020-05-05 12:42:04 +00:00
### Line 6: Body Start (Empty Line)
Once all headers are added, a client must place an empty line (`\n`) after the header.
The empty line signifies the start of the message body.
Some notes about body entries:
* The body of a message starts and ends with an empty line (`\n`).
* Every body entry is a key value pair. Keys and values are separated by a `:` character (no spaces).
* A key must be 1-90 characters in length
* A key cannot contain whitespace or control characters
* A key may contain any of the following characters:
* alphanumeric characters (a-z, A-Z, 0-9)
* dashes (`-`)
* underscores (`_`)
* Symbols used for multihashes, such as `@`, `&` and `%` (covered later).
* A value may be a:
* A string (128 characters or less)
* A multihash referencing an identity (`@`), a message (`%`) or a blob (`&`).
### Lines 7: Entry Containing a String
EXAMPLE:
```
temperature:"22.0C"
```
Body entries are defined by user and contain key/value pairs of application-specific data.
When a key/value pair represents something other than an identity, blob or message ID, a string is used.
Strings can be used for any type of data that does not fit into the other three categories.
Strings must be less than or equal to 128 characters in length.
The example above is the most simple kind of body entry. It specifies an arbitrary string representing the current temperature.
### Lines 8: Entry Referencing a Blob
EXAMPLE:
```
webcam_photo:&FV0FJ0YZADY7C5JTTFYPKDBHTZJ5JVVP5TCKP0605WWXYJG4VMRG.sha256
```
Applications may attach files to messages in the form of blobs. Blobs are referenced using a blob multihash.
* Starts with a `&` character.
* Ends with `.sha256`
* Contains exactly 52 characters between the `&` and `.sha256` parts. This is a SHA256 hash of the blob's content, represented in Crockford Base 32 encoding.
A blob is referenced in a message's key or value. A client will include a blob's content in a "bundle" (explained later).
### Lines 9: Entry Referencing a Peer's Identity
EXAMPLE:
```
weather_reported_by:@0DC253VW8RP4KGTZP8K5G2TAPMDRNA6RX1VHCWX1S8VJ67A213FM.ed25519
```
A message may reference other identities (or its own identity) by using an identity sigil either in the key or value portion of the entry.
This is analogous to "social tagging" seen in many social networks.
### Lines 10: Empty Carriage Return (Footer Start)
The last part of a message is the footer. Like a message body, a message footer starts and ends with an empty line.
The footer is essential for ensuring the tamper resistant properties of a Pigeon message.
2020-05-01 14:21:57 +00:00
### Lines 11: Signature Line
2020-05-05 12:42:04 +00:00
EXAMPLE:
```
signature JSPJJQJRVBVGV52K2058AR2KFQCWSZ8M8W6Q6PB93R2T3SJ031AYX1X74KCW06HHVQ9Y6NDATGE6NH3W59QY35M58YDQC5WEA1ASW08.sig.ed25519
```
A signature starts with the word `signature` followed by a space.
After that, the body (including the trailing `\n`) is signed using the author's ED25519 key.
The signature is encoded with Crockford base 32.
The signature ends with `.sig.ed25519`.
An empty carraige return is added after the signature line.