protocol_spec/message_format.md

7.7 KiB

Exploring the Pigeon Message Format

In the test that follows, we will explore a pigeon message line-by-line.

The example message is shown in its entirety below:

author @MF312A76JV8S1XWCHV1XR6ANRDMPAT2G5K8PZTGKWV354PR82CD0.ed25519
kind weather_report
prev %ZV85NQS8B1BWQN7YAME1GB0G6XS2AVN610RQTME507DN5ASP2S6G.sha256
depth 3
lipmaa 2

temperature:"22.0C"
webcam_photo:&FV0FJ0YZADY7C5JTTFYPKDBHTZJ5JVVP5TCKP0605WWXYJG4VMRG.sha256
weather_reported_by:@0DC253VW8RP4KGTZP8K5G2TAPMDRNA6RX1VHCWX1S8VJ67A213FM.ed25519

signature JSPJJQJRVBVGV52K2058AR2KFQCWSZ8M8W6Q6PB93R2T3SJ031AYX1X74KCW06HHVQ9Y6NDATGE6NH3W59QY35M58YDQC5WEA1ASW08.sig.ed25519

Line 1: Author

EXAMPLE:

author @MF312A76JV8S1XWCHV1XR6ANRDMPAT2G5K8PZTGKWV354PR82CD0.ed25519

The first line of a Pigeon message header is the author entry.

Every Pigeon database has an "identity". An identity is an ED25519 key pair that prevents tampering by parties other than the database owner. An identity is publicly referenced using a "multihash". In the example above, the identity multihash was @MF312A76JV8S1XWCHV1XR6ANRDMPAT2G5K8PZTGKWV354PR82CD0.ed25519.

The steps to generate a valid identity are:

  1. Perform Crockford Base32 encoding on an ED25519 public key.
  2. Add an @ symbol to the beginning of the string from step 1.
  3. Add a .ed25519 string to the end of the string from step 2.

Line 2: Kind

EXAMPLE:

kind weather_report

The second line of the header is the kind entry. This entry is user definable. The kind entry is used as a means of signalling intent to applications that will consume the message.

It must meet the following criteria:

  • Must be 1-90 characters in length
  • Cannot contain whitespace or control characters
  • May contain any of the following characters:
    • alphanumeric characters
    • dashes (-)
    • underscores (_)
    • Symbols used for multihashes, such as @, & and % (covered later).

Line 3: Prev

EXAMPLE:

prev %ZV85NQS8B1BWQN7YAME1GB0G6XS2AVN610RQTME507DN5ASP2S6G.sha256

A Pigeon message feed is a unidirectional chain of documents where the newest document points back to the document that came before it in the chain (example diagram).

To create this chain, a Pigeon message uses the prev field. The prev field contains a message multihash. In this case, the multihash is %ZV85NQS8B1BWQN7YAME1GB0G6XS2AVN610RQTME507DN5ASP2S6G.sha256.

Messages are content addressed. This is in contrast to protocols such as HTTP which use names to identify resources. Because Pigeon messages are addressed by content rather than by name, changing a message's content, even by just one character, has the effect of completely changing the message's multihash.

For the first message of a feed, this value is set to NONE.

Message multihashes are calculated as follows:

  1. The first character is a % symbol, indicating that it is a message rather than an identity, blob or string.
  2. The next 52 characters are a Crockford base 32 SHA512 hash of the previous message's content.
  3. The message multihash ends in .sha512.

Line 4: Depth

EXAMPLE:

depth 3

Pigeon messages exist in a linear sequence which only moves forward and never "forks". Every message has a depth field to indicate its "place in line". Because every message has an ever-increasing integer that never duplicates, every message in a Pigeon feed will have a unique hash. This is true even if messages have identical body content.

Line 5: Lipmaa

THIS FIELD WAS WRITTEN INCORRECTLY. THIS WILL CHANGE SOON. YOU CAN SAFELY MOVE TO THE NEXT SECTION OF THE DOCS

This concept was borrowed from the Bamboo protocol and Helger Lipmaa's thesis.

The lipmaa field (often called a "Lipmaa Link") is a special kind of prev field that allows partial verification of feeds. This field makes it possible to verify a single message (or subset of messages) without downloading the entire chain of messages.

The lipmaa field is calculated as follows:

def lipmaa(n)
  # The original lipmaa function returns -1 for 0
  # but that does not mesh well with our serialization
  # scheme. Comments welcome on this one.
  return 0 if n < 1 # Prevent -1, division by zero etc..

  m, po3, x = 1, 3, n
  # find k such that (3^k - 1)/2 >= n
  while (m < n)
    po3 *= 3
    m = (po3 - 1) / 2
  end
  po3 /= 3
  # find longest possible back-jump
  if (m != n)
    while x != 0
      m = (po3 - 1) / 2
      po3 /= 3
      x %= m
    end
    if (m != po3)
      po3 = m
    end
  end
  return n - po3
end

Line 6: Body Start (Empty Line)

Once all headers are added, a client must place an empty line (\n) after the header. The empty line signifies the start of the message body.

Some notes about body entries:

  • The body of a message starts and ends with an empty line (\n).
  • Every body entry is a key value pair. Keys and values are separated by a : character (no spaces).
  • A key must be 1-90 characters in length
  • A key cannot contain whitespace or control characters
  • A key may contain any of the following characters:
    • alphanumeric characters (a-z, A-Z, 0-9)
    • dashes (-)
    • underscores (_)
    • Symbols used for multihashes, such as @, & and % (covered later).
  • A value may be a:
    • A string (128 characters or less)
    • A multihash referencing an identity (@), a message (%) or a blob (&).

Lines 7: Entry Containing a String

EXAMPLE:

temperature:"22.0C"

Body entries are defined by user and contain key/value pairs of application-specific data. When a key/value pair represents something other than an identity, blob or message ID, a string is used. Strings can be used for any type of data that does not fit into the other three categories. Strings must be less than or equal to 128 characters in length. The example above is the most simple kind of body entry. It specifies an arbitrary string representing the current temperature.

Lines 8: Entry Referencing a Blob

EXAMPLE:

webcam_photo:&FV0FJ0YZADY7C5JTTFYPKDBHTZJ5JVVP5TCKP0605WWXYJG4VMRG.sha256

Applications may attach files to messages in the form of blobs. Blobs are referenced using a blob multihash.

  • Starts with a & character.
  • Ends with .sha256
  • Contains exactly 52 characters between the & and .sha256 parts. This is a SHA256 hash of the blob's content, represented in Crockford Base 32 encoding.

A blob is referenced in a message's key or value. A client will include a blob's content in a "bundle" (explained later).

Lines 9: Entry Referencing a Peer's Identity

EXAMPLE:

weather_reported_by:@0DC253VW8RP4KGTZP8K5G2TAPMDRNA6RX1VHCWX1S8VJ67A213FM.ed25519

A message may reference other identities (or its own identity) by using an identity sigil either in the key or value portion of the entry. This is analogous to "social tagging" seen in many social networks.

The last part of a message is the footer. Like a message body, a message footer starts and ends with an empty line. The footer is essential for ensuring the tamper resistant properties of a Pigeon message.

Lines 11: Signature Line

EXAMPLE:

signature JSPJJQJRVBVGV52K2058AR2KFQCWSZ8M8W6Q6PB93R2T3SJ031AYX1X74KCW06HHVQ9Y6NDATGE6NH3W59QY35M58YDQC5WEA1ASW08.sig.ed25519

A signature starts with the word signature followed by a space. After that, the body (including the trailing \n) is signed using the author's ED25519 key. The signature is encoded with Crockford base 32. The signature ends with .sig.ed25519. An empty carraige return is added after the signature line.