Commit Graph

41 Commits

Author SHA1 Message Date
Till 232aef016c
Add basic runtime tracing (#2996)
This allows us in almost all places to use regions to further trace down
long running tasks.
Also removes an unused function.
2023-03-13 16:45:14 +01:00
Till 9bcd0a2105
Make redaction check easier to read (#2995)
We need to check the redaction PL in Dendrite, if we do it in GMSL, we
end up not sending the event to the output stream because it will be
rejected.

---------

Co-authored-by: kegsay <kegan@matrix.org>
2023-03-03 14:03:17 +01:00
Till 6c20f8f742
Refactor `StoreEvent`, add `MaybeRedactEvent`, create an `EventDatabase` (#2989)
This PR changes the following:
- `StoreEvent` now only stores an event (and possibly prev event),
instead of also doing redactions
- Adds a `MaybeRedactEvent` (pulled out from `StoreEvent`), which should
be called after storing events
- a few other things
2023-03-01 17:06:47 +01:00
Till ad07b169b8
Refactor `StoreEvent` and create a new `RoomDatabase` interface (#2985)
This PR changes a few things:
- It pulls out the creation of several NIDs from the `StoreEvent`
function to make the functions more reusable
- Uses more caching when using those NIDs to avoid DB round trips
2023-02-24 09:40:20 +01:00
Till 2acc1d65fb
Optimize history visibility checks (#2848)
This optimizes history visibility checks by (mostly) avoiding database
hits.
Possibly solves https://github.com/matrix-org/dendrite/issues/2777

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-11-01 15:07:17 +00:00
Neil Alexander 7bd6631935
Move code for calculating auth difference into GMSL 2022-11-01 10:12:11 +00:00
Neil Alexander b05e028f7d
Tweak `LoadMembershipAtEvent` behaviour when state not known (#2716)
Previously `LoadMembershipAtEvent` would fail if the state before one of
the events was not known, i.e. because it was an outlier. This modifies
it so that it gracefully handles not knowing the state and returns no
memberships instead, so that history visibility doesn't freak out and
kill `/sync` requests dead.
2022-09-13 12:52:09 +01:00
Till 05cafbd197
Implement history visibility on `/messages`, `/context`, `/sync` (#2511)
* Add possibility to set history_visibility and user AccountType

* Add new DB queries

* Add actual history_visibility changes for /messages

* Add passing tests

* Extract check function

* Cleanup

* Cleanup

* Fix build on 386

* Move ApplyHistoryVisibilityFilter to internal

* Move queries to topology table

* Add filtering to /sync and /context
Some cleanup

* Add passing tests; Remove failing tests :(

* Re-add passing tests

* Move filtering to own function to avoid duplication

* Re-add passing test

* Use newly added GMSL HistoryVisibility

* Update gomatrixserverlib

* Set the visibility when creating events

* Default to shared history visibility

* Remove unused query

* Update history visibility checks to use gmsl
Update tests

* Remove unused statement

* Update migrations to set "correct" history visibility

* Add method to fetch the membership at a given event

* Tweaks and logging

* Use actual internal rsAPI, default to shared visibility in tests

* Revert "Move queries to topology table"

This reverts commit 4f0d41be9c.

* Remove noise/unneeded code

* More cleanup

* Try to optimize database requests

* Fix imports

* PR peview fixes/changes

* Move setting history visibility to own migration, be more restrictive

* Fix unit tests

* Lint

* Fix missing entries

* Tweaks for incremental syncs

* Adapt generic changes

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
2022-08-11 18:23:35 +02:00
Neil Alexander 05c83923e3
Optimise checking other servers allowed to see events (#2596)
* Try optimising checking if server is allowed to see event

* Fix error

* Handle case where snapshot NID is 0

* Fix query

* Update SQL

* Clean up `CheckServerAllowedToSeeEvent`

* Not supported on SQLite

* Maybe placate the unit tests

* Review comments
2022-08-01 14:11:00 +01:00
Neil Alexander 3ea21273bc
Ristretto cache (#2563)
* Try Ristretto cache

* Tweak

* It's beautiful

* Update GMSL

* More strict keyable interface

* Fix that some more

* Make less panicky

* Don't enforce mutability checks for now

* Determine mutability using deep equality

* Tweaks

* Namespace keys

* Make federation caches mutable

* Update cost estimation, add metric

* Update GMSL

* Estimate cost for metrics better

* Reduce counters a bit

* Try caching events

* Some guards

* Try again

* Try this

* Use separate caches for hopefully better hash distribution

* Fix bug with admitting events into cache

* Try to fix bugs

* Check nil

* Try that again

* Preserve order jeezo this is messy

* thanks VS Code for doing exactly the wrong thing

* Try this again

* Be more specific

* aaaaargh

* One more time

* That might be better

* Stronger sorting

* Cache expiries, async publishing of EDUs

* Put it back

* Use a shared cache again

* Cost estimation fixes

* Update ristretto

* Reduce counters a bit

* Clean up a bit

* Update GMSL

* 1GB

* Configurable cache sizees

* Tweaks

* Add `config.DataUnit` for specifying friendly cache sizes

* Various tweaks

* Update GMSL

* Add back some lazy loading caching

* Include key in cost

* Include key in cost

* Tweak max age handling, config key name

* Only register prometheus metrics if requested

* Review comments @S7evinK

* Don't return errors when creating caches (it is better just to crash since otherwise we'll `nil`-pointer exception everywhere)

* Review comments

* Update sample configs

* Update GHA Workflow

* Update Complement images to Go 1.18

* Remove the cache test from the federation API as we no longer guarantee immediate cache admission

* Don't check the caches in the renewal test

* Possibly fix the upgrade tests

* Update to matrix-org/gomatrixserverlib#322

* Update documentation to refer to Go 1.18
2022-07-11 14:31:31 +01:00
Neil Alexander c0f824d437
Wrap error from `SnapshotNIDFromEventID` 2022-07-05 15:06:10 +01:00
Neil Alexander 27948fb304
Optimise `loadAuthEvents`, add roomserver tracing 2022-06-07 14:23:26 +01:00
Neil Alexander 3d9fe20748
Fix bugs related to state resolution (#2507)
* Fix bugs related to state resolution

* Clean up `resolve-state`

* Don't panic when entries can't be found

* Ensure we have state entries for the auth events

* Revert "Ensure we have state entries for the auth events"

This reverts commit 9b13b7ed37.

* Revert "Revert "Ensure we have state entries for the auth events""

This reverts commit d86db197e3.

* Fix bug

* Try that again

* Update gomatrixserverlib

* Remove recursion from `loadAuthEvents`
2022-06-01 09:46:21 +01:00
Neil Alexander 4b01f1cd12
State resolution v2 micro-optimisations (#2226)
* Don't populate duplicates into auth events

* Only append the single event

* Potentially reduce number of iterations in `isInAllAuthLists
2022-02-24 11:09:01 +00:00
Neil Alexander eb352a5f6b
Full roomserver input transactional isolation (#2141)
* Add transaction to all database tables in roomserver, rename latest events updater to room updater, use room updater for all RS input

* Better transaction management

* Tweak order

* Handle cases where the room does not exist

* Other fixes

* More tweaks

* Fill some gaps

* Fill in the gaps

* good lord it gets worse

* Don't roll back transactions when events rejected

* Pass through errors properly

* Fix bugs

* Fix incorrect error check

* Don't panic on nil txns

* Tweaks

* Hopefully fix panics for good in SQLite this time

* Fix rollback

* Minor bug fixes with latest event updater

* Some review comments

* Revert "Some review comments"

This reverts commit 0caf8cf53e.

* Fix a couple of bugs

* Clearer commit and rollback results

* Remove unnecessary prepares
2022-02-04 10:39:34 +00:00
Neil Alexander a763cbb0e1
Roomserver/federation input refactor (#2104)
* Put federation client functions into their own file

* Look for missing auth events in RS input

* Remove retrieveMissingAuthEvents from federation API

* Logging

* Sorta transplanted the code over

* Use event origin failing all else

* Don't get stuck on mutexes:

* Add verifier

* Don't mark state events with zero snapshot NID as not existing

* Check missing state if not an outlier before storing the event

* Reject instead of soft-fail, don't copy roominfo so much

* Use synchronous contexts, limit time to fetch missing events

* Clean up some commented out bits

* Simplify `/send` endpoint significantly

* Submit async

* Report errors on sending to RS input

* Set max payload in NATS to 16MB

* Tweak metrics

* Add `workerForRoom` for tidiness

* Try skipping unmarshalling errors for RespMissingEvents

* Track missing prev events separately to avoid calculating state when not possible

* Tweak logic around checking missing state

* Care about state when checking missing prev events

* Don't check missing state for create events

* Try that again

* Handle create events better

* Send create room events as new

* Use given event kind when sending auth/state events

* Revert "Use given event kind when sending auth/state events"

This reverts commit 089d64d271.

* Only search for missing prev events or state for new events

* Tweaks

* We only have missing prev if we don't supply state

* Room version tweaks

* Allow async inputs again

* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed

* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)

* Use work queue policy, deliver all on restart

* Reduce chance of duplicates being sent by NATS

* Limit the number of servers we attempt to reduce backpressure

* Some review comment fixes

* Tidy up a couple things

* Don't limit servers, randomise order using map

* Some context refactoring

* Update gmsl

* Don't resend create events

* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't

* Exclude our own servername

* Try backing off servers

* Make excluding self behaviour optional

* Exclude self from g_m_e

* Update sytest-whitelist

* Update consumers for the roomserver output stream

* Remember to send outliers for state returned from /gme

* Make full HTTP tests less upsetti

* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist

* Remove debugging test

* Fix blacklist again, remove unnecessary duplicate context

* Clearer contexts, don't use background in case there's something happening there

* Don't queue up events more than once in memory

* Correctly identify create events when checking for state

* Fill in gaps again in /gme code

* Remove `AuthEventIDs` from `InputRoomEvent`

* Remove stray field

Co-authored-by: Kegan Dougal <kegan@matrix.org>
2022-01-27 14:29:14 +00:00
Neil Alexander b9a575919a
Try to reduce re-allocations a bit in resolveConflictsV2 2021-11-04 10:55:07 +00:00
Ryan W a624eab309
- Removed double imports (#1989)
- Lower cased error messages

Signed-off-by: Ryan Whittington <twentybitdev@gmail.com>

Co-authored-by: kegsay <kegan@matrix.org>
2021-09-08 17:31:03 +01:00
Neil Alexander 09d3bab838
Metric fixes
Squashed commit of the following:

commit c6eb4d8bbf80320ec2b6d416c77659b0343e5e47
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jul 19 16:52:57 2021 +0100

    Fix bug

commit d420966d9ac44936728960a8d38602662b58f1c3
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jul 19 16:46:12 2021 +0100

    Update metric

commit 0ad6e37846e2ebbbd0e33a38274094bd15b8f11b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jul 19 16:30:14 2021 +0100

    Fix observe for calculateStateDurations
2021-07-19 17:20:55 +01:00
Neil Alexander eb2a8e4c0b
Set buckets for dendrite_roomserver_calculate_state_duration_microseconds 2021-07-19 16:07:06 +01:00
Neil Alexander b20d402f39
dendrite_roomserver_calculate_state_duration_microseconds as histogram rather than summary 2021-07-19 15:34:12 +01:00
Neil Alexander d15836e260
Increase gocyclo complexity to 25 (and remove all but 2 golint directives related to it) (#1783) 2021-03-03 14:35:57 +00:00
Neil Alexander 5d74a1757f
Don't query for servers so often in /send (#1766)
* Look up servers less often, don't hit API for missing auth events unless there are actually missing auth events

* Remove ResolveConflictsAdhoc (since it is already in GMSL), other tweaks

* Update gomatrixserverlib to matrix-org/gomatrixserverlib#254

* Fix resolve-state

* Initialise t.servers on first use
2021-02-16 17:12:17 +00:00
Neil Alexander 64fb6de6d4
Don't retrieve same state events over and over again (#1737) 2021-01-26 09:12:17 +00:00
Neil Alexander 244ff0dccb
Don't create so many state snapshots when updating forward extremities (#1718)
* Light-weight checking of state changes when updating forward extremities

* Only do this for non-state events, since state events will always result in state change at extremities
2021-01-18 13:21:33 +00:00
Neil Alexander 20a01bceb2
Pass pointers to events — reloaded (#1583)
* Pass events as pointers

* Fix lint errors

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update to matrix-org/gomatrixserverlib#240
2020-11-16 15:44:53 +00:00
Neil Alexander 534f9a9eb6
Refactor forward extremities (#1556)
* Add resolve-state helper

* Tweaks

* Refactor forward extremities, again

* Tweaks

* Minor optimisation

* Make path a bit clearer

* Only process state/membership if forward extremities have changed

* Usage comments in resolve-state
2020-10-21 15:37:07 +01:00
Neil Alexander 2e71d2708f
Resolve state after event against current room state when determining latest state changes (#1479)
* Resolve state after event against current room state when determining latest state changes

* Update sytest-whitelist

* Update sytest-whitelist, blacklist
2020-10-05 17:47:08 +01:00
S7evinK 3e01db0049
Fix golangci-lint issues (#1464)
* Fix S1039: unnecessary use of fmt.Sprintf

* Fix S1036: unnecessary guard around map access

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>
2020-10-01 20:00:56 +01:00
Kegsay 18231f25b4
Implement rejected events (#1426)
* WIP Event rejection

* Still send back errors for rejected events

Instead, discard them at the federationapi /send layer rather than
re-implementing checks at the clientapi/PerformJoin layer.

* Implement rejected events

Critically, rejected events CAN cause state resolution to happen
as it can merge forks in the DAG. This is fine, _provided_ we
do not add the rejected event when performing state resolution,
which is what this PR does. It also fixes the error handling
when NotAllowed happens, as we were checking too early and needlessly
handling NotAllowed in more than one place.

* Update test to match reality

* Modify InputRoomEvents to no longer return an error

Errors do not serialise across HTTP boundaries in polylith mode,
so instead set fields on the InputRoomEventsResponse. Add `Err()`
function to make the API shape basically the same.

* Remove redundant returns; linting

* Update blacklist
2020-09-16 13:00:52 +01:00
Kegsay 02a73f29f8
Expand RoomInfo to cover more DB storage functions (#1377)
* Factor more things to RoomInfo

* Factor out remaining bits for RoomInfo

* Linting for now
2020-09-02 10:02:48 +01:00
Neil Alexander b24747b305
Transaction writer changes, move roomserver writers (#1285)
* Updated TransactionWriters, moved locks in roomserver, various other tweaks

* Fix redaction deadlocks

* Fix lint issue

* Rename SQLiteTransactionWriter to ExclusiveTransactionWriter

* Fix us not sending transactions through in latest events updater
2020-08-19 15:38:27 +01:00
Kegsay 24d8df664c
Fix #897 and shuffle directory around (#1054)
* Fix #897 and shuffle directory around

* Update find-lint

* goimports

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-05-21 14:40:13 +01:00
Kegsay 4ad52c67ca
Honour history_visibility when backfilling (#990)
* Make backfill work for shared history visibility

* fetch missing state on backfill to remember snapshots correctly

* Fix gmsl to not mux in auth events into room state

* Whoops

* Linting
2020-04-29 18:41:45 +01:00
Kegsay a202d88fe5
Use a single storage.Database interface (#978) 2020-04-24 10:38:58 +01:00
Neil Alexander dacee648f7
Federation for v3/v4 rooms (#954)
* Update gomatrixserverlib

* Default to room version 4

* Update gomatrixserverlib

* Limit prev_events and auth_events

* Fix auth_events, prev_events

* Fix linter issues

* Update gomatrixserverlib

* Fix getState

* Update sytest-whitelist

* Squashed commit of the following:

commit 067b875063
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Apr 3 14:29:06 2020 +0100

    Invites v2 endpoint (#952)

    * Start converting v1 invite endpoint to v2

    * Update gomatrixserverlib

    * Early federationsender code for sending invites

    * Sending invites sorta happens now

    * Populate invite request with stripped state

    * Remodel a bit, don't reflect received invites

    * Handle invite_room_state

    * Handle room versions a bit better

    * Update gomatrixserverlib

    * Tweak order in destinationQueue.next

    * Revert check in processMessage

    * Tweak federation sender destination queue code a bit

    * Add comments

commit 955244c092
Author: Ben B <benne@klimlive.de>
Date:   Fri Apr 3 12:40:50 2020 +0200

    use custom http client instead of the http DefaultClient (#823)

    This commit replaces the default client from the http lib with a custom one.
    The previously used default client doesn't come with a timeout. This could cause
    unwanted locks.
    That solution chosen here creates a http client in the base component dendrite
    with a constant timeout of 30 seconds. If it should be necessary to overwrite
    this, we could include the timeout in the dendrite configuration.
    Here it would be a good idea to extend the type "Address" by a timeout and
    create an http client for each service.

    Closes #820

    Signed-off-by: Benedikt Bongartz <benne@klimlive.de>

    Co-authored-by: Kegsay <kegan@matrix.org>

* Update sytest-whitelist, sytest-blacklist

* Update go.mod/go.sum

* Add some error wrapping for debug

* Add a NOTSPEC to common/events.go

* Perform state resolution at send_join

* Set default room version to v2 again

* Tweak GetCapabilities

* Add comments to ResolveConflictsAdhoc

* Update sytest-blacklist

* go mod tidy

* Update sytest-whitelist, sytest-blacklist

* Update versions

* Updates from review comments

* Update sytest-blacklist, sytest-whitelist

* Check room versions compatible at make_join, add some comments, update gomatrixserverlib, other tweaks

* Set default room version back to v2

* Update gomatrixserverlib, sytest-whitelist
2020-04-09 15:46:06 +01:00
Neil Alexander 664f31ec98 Ensure state res results are unique 2020-03-30 09:51:45 +01:00
Neil Alexander f2030286de
Room server changes for room versions (#930)
* Rearrange state package a bit, add some code to look up the right state resolution algorithm

* Remove shared

* Add GetRoomVersionForRoomNID

* Try to use room version to get correct state resolution algorithm

* Fix room joins over federation

* nolint resolveConflictsV2 because all attempts to break it up so far just result in it being awfully less obvious how it works

* Rename Prepare to NewStateResolution

* Update comments

* Re-add missing tests
2020-03-19 18:33:04 +00:00
Neil Alexander c20109a573
Implement room version capabilities in CS API (#866)
* Add wiring for querying the roomserver for the default room version

* Try to implement /capabilities for room versions

* Update copyright notices

* Update sytests, add /capabilities endpoint into CS API

* Update sytest-whitelist

* Add GetDefaultRoomVersion

* Fix cases where state package was shadowed

* Fix version formatting

* Update Dockerfile to Go 1.13.6

* oh yes types I remember

* And fix the default too
2020-02-05 18:06:39 +00:00
Neil Alexander 880d8ae024
Room version abstractions (#865)
* Rough first pass at adding room version abstractions

* Define newer room versions

* Update room version metadata

* Fix roomserver/versions

* Try to fix whitespace in roomsSchema
2020-02-05 16:25:58 +00:00
ruben 74827428bd use go module for dependencies (#594) 2019-05-21 21:56:55 +01:00