client-hello-mirror/NJA3.md

97 lines
7.9 KiB
Markdown

<!--
SPDX-FileCopyrightText: 2023 nervuri <https://nervuri.net/contact>
SPDX-License-Identifier: BSD-3-Clause
-->
# NJA3
NJA3 is an algorithm for deriving a fingerprint string from a TLS Client Hello message. It aims to be a more robust and accurate version of [JA3](https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967/). It makes the following changes to JA3:
1. extension codes are sorted in ascending order
2. known conditional extensions are not included: `server_name, padding, pre_shared_key, session_ticket, application_layer_protocol_negotiation, next_protocol_negotiation, token_binding, channel_id, channel_id_old`
3. the following code groups are added:
* record header TLS version
* supported TLS versions
* signature algorithms
* pre-shared key exchange modes
* certificate compression algorithms
4. 16-bit GREASE values are replaced with `0x0A0A` (2570) and 8-bit ones (PskKeyExchangeModes) with `0x0B` (11); their positions are preserved in all code groups except for the extensions group, in which codes are sorted
5. the fingerprint hash is SHA256 truncated to the left 128 bits
Points 1 and 2 aim to make the fingerprint stable in the face of predictable variations in a client's TLS Client Hello message. Extension codes are sorted as an adaptation to [Chromium having randomized the ordering of extensions](https://www.fastly.com/blog/a-first-look-at-chromes-tls-clienthello-permutation-in-the-wild), and several extensions are excluded - namely extensions that clients are known to only send some of the time. Most extensions in the exclusion list are taken from Troy Kent's ["(JA) 3 Reasons to Rethink Your Encrypted Traffic Analysis Strategies"](https://www.youtube-nocookie.com/embed/C93ivdcVL3A).
Points 3-5 make the fingerprint more accurate. NJA3 adds values from within `supported_versions`, `signature_algorithms`, `psk_key_exchange_modes` and `compress_certificate` - extensions that were standardized after JA3 was conceived. The TLS version from the record header is now also included. Each GREASE value is changed to `0x0A0A` (if 16-bit) or `0x0B` (if it's a PskKeyExchangeMode) and its position within each code group is preserved - with the exception of the extensions group, in which codes are sorted (this approach to GREASE is inspired by [mercury's](https://github.com/cisco/mercury/blob/main/doc/npf.md#tls)). MD5 is replaced with a more collision-resistant hash, while preserving MD5's convenient 16 byte length (again, something which [mercury does as well](https://github.com/cisco/mercury/blob/main/doc/npf.md#hash-representation)).
To sum it up, NJA3v1 is composed of the following code groups:
* record header TLS version
* handshake TLS version
* cipher suites
* extensions (sorted, conditional extensions ignored)
* supported groups (from the `supported_groups` extension)
* supported point formats (from the `ec_point_formats` extension)
* supported TLS versions (from the `supported_versions` extension)
* signature algorithms (from the `signature_algorithms` extension)
* pre-shared key exchange modes (from the `psk_key_exchange_modes` extension)
* certificate compression algorithms (from the `compress_certificate` extension)
Ignored extensions:
* `server_name (0)`
* `padding (21)`
* `pre_shared_key (41)`
* `session_ticket (35)`
* `application_layer_protocol_negotiation (16)`
* `next_protocol_negotiation (13172)`
* `token_binding (24)`
* `channel_id (30032)`
* `channel_id_old (30031)`
Future versions of NJA3 may be defined, to adapt to changes in TLS and to amend shortcomings found in previous versions.
Why this name? The N used to stand for "normalized", which is what the folks at [tlsfingerprint.io](https://tlsfingerprint.io/) call their new fingerprints with sorted extension codes (see [tlsfingerprint.io/norm\_fp](https://tlsfingerprint.io/norm_fp)). However, since NJA3 has come to do more than sort extension codes, let's just say it means "nervuri's take on JA3".
## Example
This is the NJA3v1 fingerprint for Chromium version 116.0.5845.180 running on Debian 12.1:
* NJA3v1: `769,771,2570-4867-4865-4866-52393-52392-49195-49199-49196-49200-49171-49172-156-157-47-53,5-10-11-13-18-23-27-43-45-51-2570-2570-17513-65281,2570-29-23-24,0,2570-772-771,1027-2052-1025-1283-2053-1281-2054-1537,1,2`
* NJA3v1 SHA256/128: `8e0ed9d95486aa6a004a682cebd14afe`
It's the same fingerprint in normal browsing mode and in incognito mode, whether session resumption is used or not. JA3, on the other hand, produces a different fingerprint on every connection.
## Alternate approaches
[Mercury's TLS fingerprint algorithm](https://github.com/cisco/mercury/blob/main/doc/npf.md#tls) ignores any extension codes not found in the following set:
```
TLS_EXT_FIXED = {
0x0001, 0x0005, 0x0007, 0x0008, 0x0009, 0x000a, 0x000b, 0x000d,
0x000f, 0x0010, 0x0011, 0x0018, 0x001b, 0x001c, 0x002b, 0x002d,
0x0032, 0x5500
}
```
Ignoring extensions outside of a fixed set has the advantage that future conditional extensions will not affect the fingerprint's stability. Perhaps future versions of NJA3 will use this approach. The drawback is that it makes the fingerprint less precise.
GREASE can be approached in several ways:
* ignore GREASE values completely, as JA3 does;
* normalize GREASE values and maintain their positions, as mercury and NJA3 do;
* mark code groups which contain GREASE values, but ignore the positions of GREASE values within those groups - an intermediary approach.
RFC 8701 [states that](https://www.rfc-editor.org/rfc/rfc8701.html#name-sending-grease-values):
> Implementations SHOULD balance diversity in GREASE advertisements with determinism. For example, a client that randomly varies GREASE value positions for each connection may only fail against a broken server with some probability. This risks the failure being masked by automatic retries. A client that positions GREASE values deterministically over a period of time (such as a single software release) stresses fewer cases but is more likely to detect bugs from those cases.
Following this guideline, Chromium places GREASE values at fixed positions within each list, including the extensions list, even as most real extensions are shuffled. This is what informed the choice of including GREASE positions in NJA3v1 (an exception is made for extensions codes, which are all sorted to simplify implementation). Future versions of NJA3 will ignore GREASE positions if other TLS implementations will be found to randomize them.
On a final note, string-based fingerprinting is fundamentally limited compared to a function-based approach. More advanced fingerprinting solutions store the entire Client Hello message and provide it as input to one or more client detection functions, the output of which can include a confidence level. In addition to TLS parameters and their order, such functions can make use of values within conditional extensions, as well as any perceivable patterns in the TLS implementation's behavior. Other messages in the TLS connection could also be used for fingerprinting - see the Future work section in ["The use of TLS in Censorship Circumvention"](https://tlsfingerprint.io/static/frolov2019.pdf#page=14):
> Client Hello messages provide a rich amount of features useful in fingerprinting TLS implementations, but there are other messages in the TLS connection that could be used to detect or block tools. For instance, once the connection is established and sends encrypted records, the lengths of these encrypted records may reveal differences between implementations
## Implementation
The first implementation is written in Go and can be found [here](https://tildegit.org/nervuri/client-hello-mirror/src/branch/master/clienthello/fingerprint.go#L69). This code is part of TLS Client Hello Mirror, a live instance of which is running at [tlsprivacy.nervuri.net](https://tlsprivacy.nervuri.net/), which will (among other things) generate the NJA3 fingerprint of any HTTPS or [Gemini](https://geminiprotocol.net/) client you connect to it.