mirror of
https://github.com/TecharoHQ/anubis.git
synced 2026-04-05 16:28:17 +00:00
Compare commits
3 Commits
Xe/analyti
...
Xe/thr1
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
6b09ac9543 | ||
|
|
de602116d0 | ||
|
|
3a4b1086af |
4
.github/actions/spelling/expect.txt
vendored
4
.github/actions/spelling/expect.txt
vendored
@@ -66,6 +66,7 @@ duckduckbot
|
||||
eerror
|
||||
ellenjoe
|
||||
enbyware
|
||||
enca
|
||||
everyones
|
||||
evilbot
|
||||
evilsite
|
||||
@@ -76,6 +77,7 @@ extldflags
|
||||
facebookgo
|
||||
Factset
|
||||
fastcgi
|
||||
fcf
|
||||
fediverse
|
||||
finfos
|
||||
Firecrawl
|
||||
@@ -139,6 +141,7 @@ lightpanda
|
||||
LIMSA
|
||||
Linting
|
||||
linuxbrew
|
||||
lkey
|
||||
LLU
|
||||
loadbalancer
|
||||
lol
|
||||
@@ -233,6 +236,7 @@ techarohq
|
||||
templ
|
||||
templruntime
|
||||
testarea
|
||||
thr
|
||||
Tik
|
||||
Timpibot
|
||||
torproject
|
||||
|
||||
202
docs/docs/developer/thr1.mdx
Normal file
202
docs/docs/developer/thr1.mdx
Normal file
@@ -0,0 +1,202 @@
|
||||
# Techaro HTTP Request Fingerprinting Version 1
|
||||
|
||||
The naïve way to identify HTTP clients is to use the HTTP User-Agent string as a signal. In an ideal world, this would give you a perfect view of what clients are connecting to your server. We do not live in that ideal world. As such, we need an alternative method that can scale to the world we have.
|
||||
|
||||
## Prior Art
|
||||
|
||||
The biggest source of prior art is [FoxIO's JA4H fingerprinting method](https://github.com/FoxIO-LLC/ja4/blob/main/technical_details/JA4H.md). This is fine, but there's a problem with it in the real world: Go doesn't allow you to observe the order headers arrived in. As Anubis is written in Go and I don't feel like boiling the HTTP server ocean today, there needs to be an alternative.
|
||||
|
||||
## THR1
|
||||
|
||||
The fingerprint consists of four concatenated components:
|
||||
|
||||
```text
|
||||
<thr1_head>_<thr1_lang>_<thr1_sec>_<thr1_ua>_<thr1_enc>
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
get201004_enca-d6b272e5b_sec-a9649072c_2a347fcf7_zs
|
||||
```
|
||||
|
||||
Each component is described below:
|
||||
|
||||
### `thr1_head`
|
||||
|
||||
Overall request summary of method, protocol, and header counts:
|
||||
|
||||
- First three letters of the HTTP method, lowercased (e.g. get, pos).
|
||||
- HTTP protocol version formatted in two digits (`10` for HTTP/1.0, `11` for HTTP/1.1, `20` for HTTP/2, `30` for HTTP/3 etc.).
|
||||
- If present, prefer the HTTP protocol version in `X-Http-Version`.
|
||||
- Number of HTTP headers sent by the client, zero-padded to two digits (e.g. `10`).
|
||||
- Number of `Sec-*` headers sent by the client, zero-padded to two digits (e.g. `04`).
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
get201004
|
||||
```
|
||||
|
||||
### `thr1_lang`
|
||||
|
||||
`Accept-Language` header details.
|
||||
|
||||
- If no `Accept-Language` header is set, then:
|
||||
|
||||
```
|
||||
-000000000
|
||||
```
|
||||
|
||||
- Otherwise:
|
||||
- The first 4 alphanumeric characters of the header value (lowercased, right-padded with `0` to length 4), e.g. `enca`.
|
||||
- The first 9 hex characters of the SHA-256 hash of the full `Accept-Language` header value.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
enca-d6b272e5b
|
||||
```
|
||||
|
||||
### `thr1_sec`
|
||||
|
||||
Details about the `Sec-*` headers sent by the client.
|
||||
|
||||
```
|
||||
thr1_sec = "sec-" + HASH9
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
- Collect **all headers whose names start with `sec-` (case-insensitive)**, excluding `Sec-Fetch-User`.
|
||||
- For each header:
|
||||
|
||||
1. Normalize the header name by lowercasing.
|
||||
2. If the header is one of the `Sec-CH-UA` family:
|
||||
|
||||
- `Sec-CH-UA`
|
||||
- `Sec-CH-UA-Mobile`
|
||||
- `Sec-CH-UA-Platform`
|
||||
- `Sec-CH-UA-Platform-Version`
|
||||
- `Sec-CH-UA-Model`
|
||||
- `Sec-CH-UA-Full-Version`
|
||||
|
||||
Apply **special normalization rules** (see below).
|
||||
|
||||
3. For all other `sec-` headers:
|
||||
- Unquote values if quoted.
|
||||
- Trim leading/trailing whitespace.
|
||||
- Keep the value as-is (do not parse further).
|
||||
|
||||
- Sort all included headers by their normalized header name (ASCII order).
|
||||
- Serialize each header as:
|
||||
|
||||
```text
|
||||
<header_name>:<normalized_value>
|
||||
```
|
||||
|
||||
- Join all serialized lines with `\n`.
|
||||
- Compute SHA-256 hash of the resulting canonical string.
|
||||
- Take the first 9 hex characters of the hash and prefix with `sec-`.
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
sec-a9649072c
|
||||
```
|
||||
|
||||
#### Special Normalization Rules for `Sec-CH-UA*` headers
|
||||
|
||||
| Header | Normalization |
|
||||
| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `Sec-CH-UA` | Parse into `{brand, version}` pairs. Omit any with brand `"Not=A?Brand"`. Sort by brand ASC. Serialize as: `ua:Brand1/Version1,Brand2/Version2,...` |
|
||||
| `Sec-CH-UA-Mobile` | Convert `"?1"` → `true`, `"?0"` → `false`. Serialize as: `mobile:true` or `mobile:false` |
|
||||
| `Sec-CH-UA-Platform` | Lowercase, unquoted, trimmed. Serialize as: `platform:<value>` |
|
||||
| `Sec-CH-UA-Platform-Version` | Unquoted, trimmed. Serialize as: `platform_version:<value>` |
|
||||
| `Sec-CH-UA-Model` | Unquoted, trimmed. Serialize as: `model:<value>` |
|
||||
| `Sec-CH-UA-Full-Version` | Unquoted, trimmed. Serialize as: `full_version:<value>` |
|
||||
|
||||
Given these headers:
|
||||
|
||||
```text
|
||||
Sec-CH-UA: "Google Chrome";v="123", "Not=A?Brand";v="8", "Chromium";v="123"
|
||||
Sec-CH-UA-Mobile: ?1
|
||||
Sec-CH-UA-Platform: "Windows"
|
||||
Sec-CH-UA-Platform-Version: "10.0.0"
|
||||
Sec-CH-UA-Model: "Pixel 7"
|
||||
Sec-CH-UA-Full-Version: "123.0.6312.122"
|
||||
Sec-Fetch-Dest: document
|
||||
Sec-Fetch-Mode: navigate
|
||||
```
|
||||
|
||||
Normalized canonical string before hashing:
|
||||
|
||||
```text
|
||||
sec-fetch-dest:document
|
||||
sec-fetch-mode:navigate
|
||||
mobile:true
|
||||
platform:windows
|
||||
platform_version:10.0.0
|
||||
full_version:123.0.6312.122
|
||||
model:Pixel 7
|
||||
ua:Chromium/123,Google Chrome/123
|
||||
```
|
||||
|
||||
Then sort by header name:
|
||||
|
||||
```text
|
||||
full_version:123.0.6312.122
|
||||
mobile:true
|
||||
model:Pixel 7
|
||||
platform:windows
|
||||
platform_version:10.0.0
|
||||
sec-fetch-dest:document
|
||||
sec-fetch-mode:navigate
|
||||
ua:Chromium/123,Google Chrome/123
|
||||
```
|
||||
|
||||
### `thr1_ua`
|
||||
|
||||
SHA256 fingerprint of the `User-Agent` string, taking the first 9 hex digits.
|
||||
|
||||
Example output:
|
||||
|
||||
```text
|
||||
2a347fcf7
|
||||
```
|
||||
|
||||
### `thr1_enc`
|
||||
|
||||
Here’s the updated spec and Go implementation for the `thr1_enc` (compression) component, now including:
|
||||
|
||||
- **Most preferred compression encoding** (`*`, `gzip`, `deflate`, `br`, `zstd`)
|
||||
- **Number of encodings declared**, truncated to **two digits** (`01`–`99`, capped)
|
||||
|
||||
---
|
||||
|
||||
### ✅ `thr1_enc` Spec (Revised)
|
||||
|
||||
**Format:**
|
||||
|
||||
```
|
||||
<preferred_encoding>-<count>
|
||||
```
|
||||
|
||||
- `preferred_encoding` is the first matching value in this priority order:
|
||||
|
||||
1. `*`
|
||||
2. `gzip`
|
||||
3. `deflate`
|
||||
4. `br`
|
||||
5. `zstd`
|
||||
|
||||
- If none match, use `none`
|
||||
- `count` is the number of encoding options, zero-padded to 2 digits (max 99)
|
||||
|
||||
**Examples:**
|
||||
|
||||
- `gzip, deflate` → `gzip-02`
|
||||
- `gzip;q=0.9, br;q=0.8` → `gzip-02`
|
||||
- `zstd` → `zstd-01`
|
||||
- `bogus` → `none-01`
|
||||
- _empty_ → `none-00`
|
||||
@@ -27,6 +27,7 @@ import (
|
||||
"github.com/TecharoHQ/anubis/lib/challenge"
|
||||
"github.com/TecharoHQ/anubis/lib/policy"
|
||||
"github.com/TecharoHQ/anubis/lib/policy/config"
|
||||
"github.com/TecharoHQ/anubis/lib/thr1"
|
||||
|
||||
// challenge implementations
|
||||
_ "github.com/TecharoHQ/anubis/lib/challenge/proofofwork"
|
||||
@@ -74,18 +75,13 @@ type Server struct {
|
||||
func (s *Server) challengeFor(r *http.Request, difficulty int) string {
|
||||
fp := sha256.Sum256(s.pub[:])
|
||||
|
||||
acceptLanguage := r.Header.Get("Accept-Language")
|
||||
if len(acceptLanguage) > 5 {
|
||||
acceptLanguage = acceptLanguage[:5]
|
||||
}
|
||||
|
||||
challengeData := fmt.Sprintf(
|
||||
"Accept-Language=%s,X-Real-IP=%s,User-Agent=%s,WeekTime=%s,Fingerprint=%x,Difficulty=%d",
|
||||
acceptLanguage,
|
||||
r.Header.Get("X-Real-Ip"),
|
||||
"THR1=%s,JA4=%s,Fingerprint=%x,User-Agent=%s,WeekTime=%s,Difficulty=%d",
|
||||
thr1.Fingerprint(r),
|
||||
r.Header.Get("X-Tls-Fingerprint-Ja4"),
|
||||
fp,
|
||||
r.UserAgent(),
|
||||
time.Now().UTC().Round(24*7*time.Hour).Format(time.RFC3339),
|
||||
fp,
|
||||
difficulty,
|
||||
)
|
||||
return internal.SHA256sum(challengeData)
|
||||
|
||||
246
lib/thr1/thr1.go
Normal file
246
lib/thr1/thr1.go
Normal file
@@ -0,0 +1,246 @@
|
||||
package thr1
|
||||
|
||||
import (
|
||||
"crypto/sha256"
|
||||
"encoding/hex"
|
||||
"log/slog"
|
||||
"net/http"
|
||||
"regexp"
|
||||
"sort"
|
||||
"strconv"
|
||||
"strings"
|
||||
)
|
||||
|
||||
func Fingerprint(r *http.Request) string {
|
||||
result := strings.Join([]string{
|
||||
thr1Head(r),
|
||||
thr1Lang(r),
|
||||
thr1Sec(r),
|
||||
thr1UA(r),
|
||||
thr1Encoding(r),
|
||||
}, "_")
|
||||
|
||||
slog.Info("THR1 got", "method", r.Method, "path", r.URL.Path, "thr1", result)
|
||||
|
||||
return result
|
||||
}
|
||||
|
||||
func thr1Head(r *http.Request) string {
|
||||
method := strings.ToLower(r.Method)
|
||||
if len(method) > 3 {
|
||||
method = method[:3]
|
||||
}
|
||||
|
||||
version := "00"
|
||||
if override := r.Header.Get("X-Http-Version"); override != "" {
|
||||
switch strings.TrimSpace(strings.ToUpper(override)) {
|
||||
case "HTTP/1.0":
|
||||
version = "10"
|
||||
case "HTTP/1.1":
|
||||
version = "11"
|
||||
case "HTTP/2.0":
|
||||
version = "20"
|
||||
case "HTTP/3.0":
|
||||
version = "30"
|
||||
}
|
||||
} else {
|
||||
switch {
|
||||
case r.ProtoMajor == 1 && r.ProtoMinor == 0:
|
||||
version = "10"
|
||||
case r.ProtoMajor == 1 && r.ProtoMinor == 1:
|
||||
version = "11"
|
||||
case r.ProtoMajor == 2:
|
||||
version = "20"
|
||||
case r.ProtoMajor == 3:
|
||||
version = "30"
|
||||
}
|
||||
}
|
||||
|
||||
hasSec := false
|
||||
for k := range r.Header {
|
||||
if strings.HasPrefix(strings.ToLower(k), "sec-") {
|
||||
hasSec = true
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
return method + version + strconv.FormatBool(hasSec)[:2]
|
||||
}
|
||||
|
||||
func thr1Encoding(r *http.Request) string {
|
||||
raw := r.Header.Get("Accept-Encoding")
|
||||
if raw == "" {
|
||||
return "none-00"
|
||||
}
|
||||
|
||||
encodings := strings.Split(raw, ",")
|
||||
count := len(encodings)
|
||||
if count > 99 {
|
||||
count = 99
|
||||
}
|
||||
|
||||
seen := make(map[string]struct{})
|
||||
var available []string
|
||||
for _, e := range encodings {
|
||||
enc := strings.ToLower(strings.TrimSpace(strings.Split(e, ";")[0]))
|
||||
if enc != "" {
|
||||
if _, exists := seen[enc]; !exists {
|
||||
available = append(available, enc)
|
||||
seen[enc] = struct{}{}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
priorities := map[string]int{
|
||||
"zstd": 1,
|
||||
"br": 2,
|
||||
"deflate": 3,
|
||||
"gzip": 4,
|
||||
"*": 5,
|
||||
}
|
||||
|
||||
best := "none"
|
||||
bestRank := 999 // arbitrarily high
|
||||
for _, enc := range available {
|
||||
if rank, ok := priorities[enc]; ok {
|
||||
if rank < bestRank {
|
||||
best = enc
|
||||
bestRank = rank
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if best == "*" {
|
||||
best = "wild"
|
||||
}
|
||||
|
||||
return best + "-" + pad2(count)
|
||||
}
|
||||
|
||||
func pad2(n int) string {
|
||||
if n < 10 {
|
||||
return "0" + strconv.Itoa(n)
|
||||
}
|
||||
if n > 99 {
|
||||
return "99"
|
||||
}
|
||||
return strconv.Itoa(n)
|
||||
}
|
||||
|
||||
func thr1Lang(r *http.Request) string {
|
||||
raw := r.Header.Get("Accept-Language")
|
||||
if raw == "" {
|
||||
return "-000000000"
|
||||
}
|
||||
trimmed := first4AlphaNum(strings.ToLower(raw)) + "-"
|
||||
sum := sha256.Sum256([]byte(raw))
|
||||
return trimmed + hex.EncodeToString(sum[:])[:9]
|
||||
}
|
||||
|
||||
func first4AlphaNum(s string) string {
|
||||
out := make([]rune, 0, 4)
|
||||
for _, ch := range s {
|
||||
if len(out) == 4 {
|
||||
break
|
||||
}
|
||||
if ('a' <= ch && ch <= 'z') || ('0' <= ch && ch <= '9') {
|
||||
out = append(out, ch)
|
||||
}
|
||||
}
|
||||
for len(out) < 4 {
|
||||
out = append(out, '0')
|
||||
}
|
||||
return string(out)
|
||||
}
|
||||
|
||||
func thr1Sec(r *http.Request) string {
|
||||
var lines []string
|
||||
for k, vs := range r.Header {
|
||||
lkey := strings.ToLower(k)
|
||||
if !strings.HasPrefix(lkey, "sec-") || lkey == "sec-fetch-user" {
|
||||
continue
|
||||
}
|
||||
switch lkey {
|
||||
case "sec-ch-ua":
|
||||
lines = append(lines, parseSecChUA(vs))
|
||||
case "sec-ch-ua-mobile":
|
||||
lines = append(lines, parseSecCHSimple("mobile", vs))
|
||||
case "sec-ch-ua-platform":
|
||||
lines = append(lines, parseSecCHSimple("platform", vs))
|
||||
case "sec-ch-ua-platform-version":
|
||||
lines = append(lines, parseSecCHSimple("platform_version", vs))
|
||||
case "sec-ch-ua-model":
|
||||
lines = append(lines, parseSecCHSimple("model", vs))
|
||||
case "sec-ch-ua-full-version":
|
||||
lines = append(lines, parseSecCHSimple("full_version", vs))
|
||||
default:
|
||||
for _, v := range vs {
|
||||
v = strings.Trim(v, `" `)
|
||||
lines = append(lines, lkey+":"+v)
|
||||
}
|
||||
}
|
||||
}
|
||||
sort.Strings(lines)
|
||||
canonical := strings.Join(lines, "\n")
|
||||
sum := sha256.Sum256([]byte(canonical))
|
||||
return "sec-" + hex.EncodeToString(sum[:])[:9]
|
||||
}
|
||||
|
||||
var brandVersionRe = regexp.MustCompile(`\s*"([^"]+)";v="([^"]+)"`)
|
||||
|
||||
func parseSecChUA(vs []string) string {
|
||||
type pair struct{ Brand, Version string }
|
||||
var pairs []pair
|
||||
|
||||
for _, v := range vs {
|
||||
for _, match := range brandVersionRe.FindAllStringSubmatch(v, -1) {
|
||||
if len(match) != 3 {
|
||||
continue
|
||||
}
|
||||
brand := match[1]
|
||||
version := match[2]
|
||||
if brand == "Not=A?Brand" {
|
||||
continue
|
||||
}
|
||||
pairs = append(pairs, pair{brand, version})
|
||||
}
|
||||
}
|
||||
|
||||
sort.Slice(pairs, func(i, j int) bool {
|
||||
return pairs[i].Brand < pairs[j].Brand
|
||||
})
|
||||
|
||||
var sb strings.Builder
|
||||
sb.WriteString("ua:")
|
||||
for i, p := range pairs {
|
||||
if i > 0 {
|
||||
sb.WriteString(",")
|
||||
}
|
||||
sb.WriteString(p.Brand + "/" + p.Version)
|
||||
}
|
||||
return sb.String()
|
||||
}
|
||||
|
||||
func parseSecCHSimple(key string, vs []string) string {
|
||||
for _, v := range vs {
|
||||
v = strings.Trim(v, `" `)
|
||||
if key == "mobile" {
|
||||
switch v {
|
||||
case "?1":
|
||||
return "mobile:true"
|
||||
case "?0":
|
||||
return "mobile:false"
|
||||
default:
|
||||
continue
|
||||
}
|
||||
}
|
||||
return key + ":" + v
|
||||
}
|
||||
return key + ":"
|
||||
}
|
||||
|
||||
func thr1UA(r *http.Request) string {
|
||||
ua := r.Header.Get("User-Agent")
|
||||
sum := sha256.Sum256([]byte(ua))
|
||||
return hex.EncodeToString(sum[:])[:9]
|
||||
}
|
||||
Reference in New Issue
Block a user