bikeshed/rfb/drafts/draft-hina-01.txt

826 lines
27 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Bikeshedding Working Group ~lucidiot, Ed.
Bikeshed-Draft Bikeshedding Microsystems
Intended status: Standards Track June 30, 2021
Expires: December 30, 2021
Asahina Antenna Metadata Format (HINA) 2.2, revision 0.13
draft-hina-01
Abstract
This document is an English translation and clean-up of the Asahina
Antenna Metadata Format (HINA) 2.2, in its latest revision, 0.13, of
July 19, 2002. It aims to preserve historical knowledge over
syndication formats.
Status of This Memo
This Bikeshed-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Bikeshed-Drafts are working documents of the Bikeshedding
Microsystems Working Task Force (BM-WTF). Note that other groups may
also distribute working documents as Bikeshed-Drafts. The list of
current Bikeshed-Drafts does not exist.
Bikeshed-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Bikeshed-Drafts as reference
material or to cite them other than as "work in progress."
This Bikeshed-Draft will expire on October 4, 2021.
Copyright Notice
Copyright (c) 2021 The Bikeshedding Microsystems and the persons
identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the Bikeshedding Microsystems'
Legal Provisions Relating to Bikeshedding Documents in effect on the
date of publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document.
~lucidiot Expires December 30, 2021 [Page 1]
Bikeshed-Draft HINA 2.2 June 2021
Table of Contents
1. Introduction ...................................................3
1.1. Notational Conventions ..................................3
2. Data Types .....................................................4
3. Structure ......................................................4
3.1. Block ....................................................4
3.2. Header Block .............................................5
3.3. Entity Block .............................................5
4. Fields .........................................................5
4.1. HINA .....................................................5
4.2. User-Agent ...............................................6
4.3. URL ......................................................6
4.4. HINA-Version .............................................6
4.5. Virtual ..................................................6
4.6. Content-Type .............................................7
4.7. Date .....................................................7
4.8. Title ....................................................7
4.9. Author-Name ..............................................7
4.10. Expires ..................................................7
4.11. Expire ...................................................7
4.12. Last-Modified ............................................8
4.13. Last-Modified-Detected ...................................8
4.14. Server ...................................................8
4.15. Authorized ...............................................8
4.16. Authorized-url ...........................................8
4.17. Method ...................................................8
4.17.1. Method Types .....................................9
4.17.2. Example ..........................................9
4.18. Keyword ..................................................9
4.19. Image-Width ..............................................9
4.20. Image-Height ............................................10
4.21. Experimental Fields .....................................10
4.22. Undefined Fields ........................................10
5. Encoding ......................................................10
6. Propagation ...................................................10
7. Security Considerations .......................................11
8. Internationalization Considerations ...........................11
9. Privacy Considerations ........................................11
10. BANANA Considerations .........................................11
11. References ....................................................11
11.1. Normative References ....................................11
11.2. Informative References ..................................12
Appendix A. Warranty Exclusion Statement ..........................13
Appendix B. Glossary ..............................................13
Acknowledgements ..................................................13
Author's Address ..................................................14
~lucidiot Expires December 30, 2021 [Page 2]
Bikeshed-Draft HINA 2.2 June 2021
1. Introduction
In the early days of RSS, before Atom and itself took over the world
of syndication, and before Unicode became common enough to reduce
internationalization issues, several syndication formats were being
developed at smaller scales.
As those formats are now slowly dying, the Bikeshedding Microsystems
Research and Development Division invests in dusting them off and
re-standardizing them to ensure that their specification is not lost
to time, and that future generations can still benefit from past
experience.
As XML [XML] and the Really Simple Syndication [RSS2] appeared very
recently, and the lack of general Unicode support in software led to
encoding incompatibilities between Western sites in ASCII [ASCII] and
Japanese sites in Shift-JIS [SHIFTJIS] or EUC-JP, the first programs
that offered a concept of syndication in Japan were called
"last-modified-time detection agents".
The most popular last-modified detection agent was Asahina-Antenna,
which led to the term "antenna" being used to refer to this type of
software. Asahina-Antenna used its own syndication format, the
Asahina Antenna Metadata Format (HINA). Some sites still serve
HINA feeds as of 2021.
HINA 1.x, also known as "hina.txt", was a text format whose
specification has not yet been recovered by our historians and was
used by Asahina Antenna 1.x.
HINA 2.x, also known as HINA-DI, has been influenced by Document
Information (DI), a project that aimed to develop document metadata
exchange and provided a mailing list. Hiroshi Nakamura intended to
create the Document Information Read Protocol (DIRP) and Document
Information Transfer Protocol (DITP), two standards to form a network
for distributed syndication based on RDF. The specifications were
never published, but HINA builds on this idea of decentralization.
In a way, DI and HINA 2.x have the same ideas as ActivityPub and the
modern concepts of federated software, but were 15 years early.
1.1. Notational Conventions
This document uses the Backus-Naur notation [RFC822] to formally
define the format.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document SHALL NOT be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
~lucidiot Expires December 30, 2021 [Page 3]
Bikeshed-Draft HINA 2.2 June 2021
2. Data Types
The basic data types that constitute Hina-Di are listed below.
The US-ASCII character set is defined by ANSI X3.4-1986 [ASCII].
OCTET = <any 8-bit sequence of data>
CHAR = <any US-ASCII character (octets 0 - 127)>
UPALPHA = <any US-ASCII uppercase letter "A".."Z">
LOALPHA = <any US-ASCII lowercase letter "a".."z">
ALPHA = UPALPHA | LOALPHA
DIGIT = <any US-ASCII digit "0".."9">
WORD = 1*(ALPHA|DIGIT)
CTL = <any US-ASCII control character (octets 0 - 31)
and DEL (127)>
CR = <US-ASCII CR, carriage return (13)>
LF = <US-ASCII LF, linefeed (10)>
SP = <US-ASCII SP, space (32)>
HT = <US-ASCII HT, horizontal-tab (9)>
<"> = <US-ASCII double-quote mark (34)>
CRLF = CR LF
TEXT = <any OCTET except CTLs, but including HT>
TOKEN = <any TEXT, but don't start with SP or HT>
SEPARATOR = ":" 1*(SP|HT)
DELIMITER = "," *(SP|HT)
SLASH = "/" *(SP|HT)
3. Structure
A Hina-Di file consists of a series of blocks that summarize the
metadata on a website: a header block, followed by one or more entity
blocks.
hina-di = header-block
1*( entity-block )
3.1. Block
A block is a set of metadata for a document. Each metadata is
represented as a single header, in a manner similar to RFC 822
message headers [RFC822], with a field name and a field value.
Field names in a block MUST be unique. A block with duplicate field
names MUST be discarded by clients.
Field names are case-insensitive. Unless explicitly stated for a
particular field, a field's value is case-insensitive.
~lucidiot Expires December 30, 2021 [Page 4]
Bikeshed-Draft HINA 2.2 June 2021
line-format = field-name SEPARATOR field-value CRLF
field-name = WORD *( "-" WORD)
field-value = TOKEN
3.2. Header Block
Exactly one header block MUST appear in a Hina-Di file, and it MUST
be the first block. It holds metadata about the Hina-Di file itself.
header-block = HINA
Hinadi-Header
CRLF
Hinadi-Header = 1*( User-Agent
| Content-Type
| Date )
3.3. Entity Block
One or more entity blocks MUST be present after the header block.
Each entity block defines metadata about a specific document.
Entity-block = URL ( HINA-Version
| Virtual
| Content-Type
| Date
| Title
| Author-Name
| Expires
| Expire
| Last-Modified
| Last-Modified-Detected
| Server
| Authorized
| Authorized-url
| Method
| Keyword
| Image-Width
| Image-Height
| Experimental-field
| Undefined-field )
CRLF
4. Fields
This section defines the various fields that may be found in blocks.
All fields are OPTIONAL and case-insensitive unless otherwise
specified.
4.1. HINA
Indicates that this is a Hina-Di file, and includes its version.
This field is REQUIRED as the first field of Hina-Di files.
~lucidiot Expires December 30, 2021 [Page 5]
Bikeshed-Draft HINA 2.2 June 2021
HINA = "HINA" "/" hinadi-version CRLF
hinadi-version = "2.2beta"
4.2. User-Agent
Name of the user agent that created this Hina-Di file.
This field is REQUIRED in header blocks.
The value of this field is case-sensitive.
User-Agent = "User-Agent" SEPARATOR TOKEN CRLF
4.3. URL
URL of the document, compliant with [RFC2396].
This field is REQUIRED in entity blocks.
Making this field the first field of an entity block is RECOMMENDED.
The scheme and domain portions of the URL are not case-sensitive.
If the other portions of the URL are not case-insensitive, they
SHOULD be written using lowercase characters.
URL = "URL" SEPARATOR rfc2396-url CRLF
rfc2396-url = <URI described in section 5.1.2 "Request-URI"
in RFC 2396>
Implementations can use this field as a unique key that distinguishes
the entity block from other blocks. To ensure proper uniqueness of
this field, the following conditions MUST be respected by the
providing Hina-Di user agents or their administrators:
If the URL can end in a slash (`/`), then it SHOULD end in a slash.
Prefer `http://www.hoge.jp/foo/` over `http://www.hoge.jp/foo`.
If the URL includes a file name, but the file name can be omitted,
then it SHOULD be omitted. Prefer `http://www.hoge.jp/foo/` over
`http://www.hoge.jp/foo/index.html`
4.4. HINA-Version
Specifies that the integrity of the entity block was guaranteed
according to the specification of a specific Hina-Di version.
If this field is missing from an entity block, it means the block
might be incomplete.
HINA-Version = "HINA-Version" SEPARATOR version
version = "HINA" "/" 1*( DIGIT ) "." 1*( DIGIT )
4.5. Virtual
URL of another Hina-Di file that holds the entity block, compliant
with [RFC2396].
~lucidiot Expires December 30, 2021 [Page 6]
Bikeshed-Draft HINA 2.2 June 2021
If there are fields in the entity block other than `Virtual`, then
it takes the same meaning as the regular `URL` field.
The case-sensitivity and URL uniqueness conditions defined for the
`URL` field MUST be followed for this field.
Virtual = "Virtual" SEPARATOR rfc2396-url CRLF
Note that the original Japanese specification defines the `Virtual`
feed as `Vitural`.
4.6. Content-Type
MIME type of the Hina-Di file or the document, as described in
[RFC1521]. The value of this field is case-sensitive to the extent
defined by RFC 1521.
Content-Type = "Content-Type" SEPARATOR rfc1521-type CRLF
4.7. Date
The date and time when the block or the Hina-Di file was generated.
The dates MUST comply with [RFC1123]. The value of this field is
case-sensitive.
Date = "Date" SEPARATOR rfc1123-date CRLF
4.8. Title
The title of the document.
Title = "Title" SEPARATOR TOKEN CRLF
4.9. Author-Name
Name of the author of the document.
The value of this field is case-sensitive.
Author-Name = "Author-Name" SEPARATOR TOKEN CRLF
4.10. Expires
Expiration date for the block. The dates MUST comply with [RFC1123].
The value of this field is case-sensitive to the extent defined by
RFC 1123.
Expires = "Expires" SEPARATOR rfc1123-date CRLF
4.11. Expire
Alias for the `Expires` field, included for backwards compatibility.
~lucidiot Expires December 30, 2021 [Page 7]
Bikeshed-Draft HINA 2.2 June 2021
Expire = "Expire" SEPARATOR rfc1123-date CRLF
4.12. Last-Modified
Date and time when the document was last updated. The date MUST
comply with [RFC1123]. The value of this field is case-sensitive to
the extent defined by RFC 1123.
Last-Modified = "Last-Modified" SEPARATOR rfc1123-date CRLF
4.13. Last-Modified-Detected
Date and time representing when the user agent retrieved the
document's metadata. The dates MUST comply with [RFC1123].
The value of this field is case-sensitive to the extent defined by
RFC 1123.
Last-Modified-Detected = "Last-Modified-Detected"
SEPARATOR rfc1123-date CRLF
4.14. Server
User agent string of the server used to retrieve the metadata of the
document described by this entity block.
Server = "Server" SEPARATOR TOKEN CRLF
4.15. Authorized
The user agent that retrieved the metadata of the document described
by this entity block.
Authorized = "Authorized" SEPARATOR TOKEN CRLF WORD
4.16. Authorized-url
URL of a page describing the user agent referred to in the
`Authorized` field, compliant with [RFC2396].
The case-sensitivity and URL uniqueness conditions defined for the
`URL` field MUST be followed for this field.
Authorized-url = "Authorized-url" SEPARATOR rfc2396-url CRLF
4.17. Method
Describes the chain of propagation that this entity block went
through.
~lucidiot Expires December 30, 2021 [Page 8]
Bikeshed-Draft HINA 2.2 June 2021
Method = "Method" SEPARATOR method-type
*(SLASH method-type) (SLASH result-code)
method-type = "GET" | "HEAD" | "FILE" | "REMOTE"
result-code = <Status code from the IANA HTTP Status codes
registry [STATUS]>
The original Japanese specification defined result-code as follows:
result-code = <URI described on "???????" in RFC 2396>
4.17.1. Method Types
GET Metadata retrieved using a HTTP GET request.
HEAD Metadata retrieved using a HTTP HEAD request.
FILE Metadata retrieved from a local file's timestamp.
REMOTE Metadata retrieved from an entity block generated by another
agent.
4.17.2. Example
Method: REMOTE/REMOTE/GET/200
1. A first user agent retrieved the metadata on the document using
an HTTP GET request and got a 200 response code (`GET/200`).
2. A second user agent retrieved the first user agent's Hina-Di
file, then propagated it to its own file (`REMOTE`).
3. A third user agent retrieved the second user agent's Hina-Di
file, then propogated it to its own file (`REMOTE`).
4.18. Keyword
Words that can be used to give an overview of the document described
by this entity block; tags, categories, etc. The value of this field
is case-sensitive.
Keyword = "Keyword" SEPARATOR keywords CRLF
keywords = TOKEN *(SEPARATOR TOKEN)
4.19. Image-Width
Width of an image described by an entity block, in pixels.
This field MUST NOT be used for entity blocks that do not describe
images.
Image-Width = "Image-Width" SEPARATOR width CRLF
width = DIGIT
~lucidiot Expires December 30, 2021 [Page 9]
Bikeshed-Draft HINA 2.2 June 2021
4.20. Image-Height
Height of an image described by an entity block, in pixels.
This field MUST NOT be used for entity blocks that do not describe
images.
Image-Height = "Image-Height" SEPARATOR width CRLF
height = DIGIT
4.21. Experimental fields
Implementations MAY define custom fields with an X- prefix to provide
additional metadata not covered in this specification.
Implementations MUST NOT assume that all clients will use each of
those fields. Clients that do not support any experimental field
SHOULD ignore them.
Experimental fields MAY include data that is not directly related to
metadata that the document has, and SHOULD be used shall a field for
that purpose be created by an implementor.
Experimental-field = x-field-name SEPARATOR TOKEN
x-field-name = "X-" WORD *("-" WORD)
4.22. Undefined fields
Any field that is not defined in this specification. Implementations
that encounter such fields and do not support them SHOULD ignore
them.
Undefined-field = undef-field-name SEPARATOR TOKEN CRLF
undef-field-name = WORD *("-" WORD)
5. Encoding
The character encoding of the Hina-Di file SHOULD be specified as a
parameter of the Content-Type field of the header block. If it is
not specified, it defaults to EUC-JP.
6. Propagation
In Hina-Di, metadata propagation consists in acquiring metadata from
other agents, then sharing it as it is in the user agent's own
Hina-Di file. This can be used for aggregation services or a
peer-to-peer network.
The Authorized and Authorized-url fields allow indicating the user
agent from which the metadata originally came from to help ensure its
legitimacy. Propagating MUST only be performed if both fields are
defined and if the user agent is trusted.
~lucidiot Expires December 30, 2021 [Page 10]
Bikeshed-Draft HINA 2.2 June 2021
When propagating, all fields of an entity block defined in this
specification, with the exception of experimental and undefined
fields or of fields with empty values, MUST be reproduced without
modification. Propagation of experimental or undefined fields is not
guaranteed. A header block, or any field that is part of it, MUST
NOT be propagated.
The Method field MUST be updated upon propagation according to the
process described in section 4.17.
7. Security Considerations
The HINA format was designed at a time when security was far from the
primary concern for most Internet users. It is however easy, should
a modern implementation of the format be created, to seamlessly use
HTTPS instead of HTTP, both in the URLs of the syndicated content and
to serve the HINA feeds themselves.
8. Internationalization Considerations
While the format defaults to EUC-JP, it is possible to specify other
encodings for the whole file using the Content-Type field in the
Header Block. As most HINA feeds will be served over HTTP, the
Content-Type field of the HTTP response could also include this
encoding.
9. Privacy Considerations
The metadata shared in HINA is already public, and disclosure of this
information is under the full control of the publisher. However, in
a situation of propagation, the removal of already propagated links
or metadata may not be strictly performed by all implementors. This
can lead to the same issues as those seen in present-day federated
social networks, to which the only guaranteed solution is to not
publish what you might have to remove later.
10. BANANA Considerations
This document updates the "HINA" value in the E-mail Signature
Protocol Abbreviations registry.
The Reference field of the value should now point to this document.
11. References
11.1. Normative References
[ASCII] "American National Standard for Information Systems <20>
Coded Character Sets <20> 7-Bit American National Standard
Code for Information Interchange (7-Bit ASCII)",
ANSI X3.4-1986, American National Standards Institute,
March 1986.
~lucidiot Expires December 30, 2021 [Page 11]
Bikeshed-Draft HINA 2.2 June 2021
[RFC1123] Braden, R., "Requirements for Internet Hosts - Application
and Support", RFC 1123, DOI 10.17487/RFC1123,
October 1989, <https://www.rfc-editor.org/info/rfc1123>.
[RFC1521] Borenstein, N., Freed, N., "MIME (Multipurpose Internet
Mail Extensions) Part One: Mechanisms for Specifying and
Describing the Format of Internet Message Bodies",
RFC 1521, DOI 10.17487/RFC1521, September 1993,
<https://www.rfc-editor.org/info/rfc1521>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC2396] Berners-Lee, T., Fielding, R., Masinter, L., "Uniform
Resource Identifiers (URI): Generic Syntax", RFC 2396,
DOI 10.17487/RFC2396, August 1998,
<https://www.rfc-editor.org/info/rfc2396>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[STATUS] "Hypertext Transfer Protocol (HTTP) Status Code Registry",
Internet Assigned Numbers Authority,
<https://www.iana.org/assignments/http-status-codes/
http-status-codes.xhtml>.
11.2. Informative References
[RFC822] Crocker, D., "Standard for the Format of ARPA Internet
Text Messages", RFC 822, DOI 10.17487/RFC0822,
August 1982, <https://www.rfc-editor.org/info/rfc822>.
[RSS2] RSS Advisory Board, "RSS 2.0 Specification", Version
2.0.11, March 2009,
<https://www.rss-board.org/rss-specification>.
[SHIFTJIS] "7-bit and 8-bit double byte coded KANJI sets for
information interchange", JIS X0208:1997, Japanese
Standards Association, January 1997.
[XML] Bray, T., Paoli, J., Sperberg-McQueen, C. M. and E.
Maler, "Extensible Markup Language (XML) 1.0 (Second
Edition)", W3C Recommendation REC-xml-20001006, October
2000, <https://www.w3.org/TR/2000/REC-xml-20001006>.
~lucidiot Expires December 30, 2021 [Page 12]
Bikeshed-Draft HINA 2.2 June 2021
Appendix A. Warranty Exclusion Statement
This document and the information contained herein is provided on an
"AS IS" basis and TILDE.TOWN DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Appendix B. Glossary
This glossary was part of the original Japanese specification and is
left here to provide historical context to this standard.
Asahina-Antenna
Metadata acquisition agent based on HINA.
metadata
Information about the content, such as the author, title and
update time.
hina-di
Metadata transfer format used by Asahina-Antenna 2.x.
hina.txt
Metadata transfer format used by Asahina-Antenna 1.x, made
obsolete by hina-di.
DI
Document Information. Project that was developing the Document
Information Transfer Protocol (DITP) and Document Information Read
Protocol (DIRP), for decentralized syndication.
Hina-Di has been influenced by DI.
Acknowledgements
The author would like to thank Hiroshi Nakamura for sharing the idea
of the DITP and DIRP and developing decentralized technologies before
they truly came to life.
The author would like to thank Masayoshi Takahashi for providing an
English summary of the Japanese last-modified-time detection agents
in 1999.
Finally, the author would like to thank the Internet Archive and all
the contributors, donators, voulunteers involved, as without them
this research would have never been possible.
~lucidiot Expires December 30, 2021 [Page 13]
Bikeshed-Draft HINA 2.2 June 2021
Author's Address
~lucidiot (editor)
Bikeshedding Microsystems
m455.casa
138.197.184.222
The Internet
Email: lucidiot@brainshit.fr
URI: https://tilde.town/~lucidiot/
~lucidiot Expires December 30, 2021 [Page 14]