Created Holes (markdown)

Kegsay 2020-05-12 16:29:23 +01:00
parent 169e778f12
commit 46b8d7b88b
1 changed files with 165 additions and 0 deletions

165
Holes.md Normal file

@ -0,0 +1,165 @@
# A whole lot about holes in Federation
Following on from the [state resolution v2](https://matrix.org/docs/guides/implementing-stateres) blog post we did,
the Dendrite team is here to try to explain another mystical aspect of Federation: holes.
### Background context
Events in a room form a Directed Acyclic Graph (DAG), where each event "points" to the previous event before it.
```
A ----- B ----- C
[] [A] [B] prev_events
```
Events can also branch out and form forks, which can then later be merged:
```
.--- D --.
| [B] |
| |
A ----- B -+--- C --+--- E
[] [A] [B] [C,D] prev_events
```
When a server receives an event, it probably has the `prev_events` and therefore it can perform all the calculations
it needs to. However, if it doesn't have them it can request them via [/get_missing_events](https://matrix.org/docs/spec/server_server/latest#post-matrix-federation-v1-get-missing-events-roomid):
```
A ---- B ---- C ---- D ---- E
Server 1 has? ✓ ✓ ✓
Server 2 has? ✓ ✓ ✓ ✓ ✓
Server 1 has A,B and then receives E. It needs to request C,D:
POST /_matrix/federation/v1/get_missing_events/%21SomeRoom%3Amatrix.org HTTP/1.1
Content-Type: application/json
{
"limit": 10,
"min_depth": 0,
"earliest_events": [
"$event_B:example.org"
],
"latest_events": [
"$event_E:example.org"
]
}
```
NB: Be aware of the misleading key names. `/get_missing_events` does *not* return events "between" the two events given, but instead
walks the DAG, backwards from `latest_events` and *ignoring* `earliest_events`. This distinction is important for DAGs like this:
```
.----- D ---- E ---- F Server 2
A --- B --|
`----- C Server 1
Server 1 has A,B,C and then receives F
Server 2 has A,B,D,E,F
Server 1 needs to request D,E
POST /_matrix/federation/v1/get_missing_events/%21SomeRoom%3Amatrix.org HTTP/1.1
Content-Type: application/json
{
"limit": 10,
"min_depth": 0,
"earliest_events": [
"$event_C:example.org"
],
"latest_events": [
"$event_F:example.org"
]
}
```
If `/get_missing_events` returned events "between" the two events, it would not return D,E as they are not between C,F in the DAG.
This API works well when the server is missing a few events, but sometimes the server is missing a *lot* of events. When this happens,
the server may decide to give up trying to call `/get_missing_events` over and over and instead a hole is born.
#### What's a hole and why should I care?
A hole refers to the "hole" in the DAG that is formed when a server has missed many events from Federation and decides to give up
backfilling via `/get_missing_events` e.g. the server has been offline for a long time.
These DAGs can have many different shapes: (Server previously had A,B,C and wants to now store X,Y,Z) - we'll refer to these events throughout.
```
A A A
| / \ |
B B D B
| | | |
C C E C
| | / \
... ... ... ...
| | \ /
X X X
| | |
Y Y Y
| | |
Z Z Z
Linear Early fork Fork
```
Servers need to care about holes so they can perform auth checks and verify the room state for all the messages it wants to persist. This means it needs to know the state
of the room before X to work out if X is allowed, and so on for Y and Z. Clients need to care about holes so they can show the gap on their UI and backfill via `/messages` requests.
### How servers should handle holes
Servers need to perform auth checks to work out if the event being processed is "allowed" according to the rules of the room. These checks are done on a snapshot of the room state at a particular point in time: just before the event being processed is added to the DAG.
#### A bit on state terminology
A frequent gotcha is the distinction between before/after/at when referring to the "state of the room" for a particular event. The state "at" an event is an ambigous term as it could refer to before or after the event has been applied. Ideally, no code or specification should refer to the state "at" an event. The terms "before" and "after" are also somewhat suggestive and unfortunately the intuition is wrong: the state *after* B is not **always** the state *before* C.
Consider the following graphs, one linear and one fork:
```
Before A ___ ________ Before B
After A ____A A B ________ After B - no state from A involved, B applied.
Before B ___| \ /__________ Before C - state resolution over A,B applied, C not involved.
After B ____B C___________ After C - state resolution over A,B applied, C applied.
Before C ___|
After C ____C
```
For the linear DAG, the state after B is the same as the state before C. However, for the fork DAG, the state after B has no knowledge of A and hence isn't included. The state before C is the result of the state resolution algorithm being applied over the two branches A,B and hence will result in different state.
#### How to not work out the state before X
- We want to persist X,Y,Z
- You need to know the state before X.
- Get the state before X (e.g via `/state_ids`)
This is intuitive and simple but unfortunately is wrong because it means that we are entirely trusting the room state in `/state_ids`. Consider the example:
```
TODO: Explain an attack.
```
We need to perform state resolution to resolve this. So instead, server implementations should do the following...
#### How to work out state before X
- We want to persist X,Y,Z
- You need to know the state before X.
Steps:
- Get the state AFTER X's prev_events
- Apply your current known state after event C.
- Apply state resolution over it.
- You now have the state before X.
##### Woah there, what?!
Yep, we mix in what we previously knew about the room into the state resolution algorithm, and rely on the algorithm to come to a sensible conclusion.
Effectively, we're pretending that C is a prev_event to X, even when it isn't! This fixes the attack whereby:
```
TODO: Explain how this fixes the attack
```
#### Rejected events
TODO
- It can still be referenced in the DAG by other servers
- But also, just because the event is rejected doesn't mean its useless
- (eg the prev events might still be useful for ordering purposes, and for figuring out state of any events pointing at it)