Compare commits

..

58 Commits

Author SHA1 Message Date
Xe Iaso
ecc716940e chore: release v1.20.0-pre1
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-19 19:32:49 -04:00
Xe Iaso
4948036f39 feat: add default OpenGraph tags to configuration file (#694)
* feat(config): opengraph passthrough configuration

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(ogtags): use config.OpenGraph for configuration

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: wire up ogtags config in most of the app

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(ogtags): return default tags if they are supplied

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: make OpenGraph legal so we have some sanity in reviewing

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): use OpenGraph.Enabled

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(lib): load default config file if one is not specified in spawnAnubis

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(config): fix ST1005

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: document open graph defaults and its new home in the policy file

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(installation): point to weight threshold new home

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: rename default to override

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(default-config): add off-by-default opengraph settings to bot policy file

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(anubis): make build

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(lib): fix build

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-19 18:00:44 -04:00
Xe Iaso
7aa732c700 fix(config): actually load threshold config (#696)
* fix(config): actually load threshold config

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(lib): fix test failures

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-19 17:13:01 -04:00
Xe Iaso
226cf36bf7 feat(config): custom weight thresholds via CEL (#688)
* feat(config): add Thresholds to the top level config file

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(config): make String() on ExpressionOrList join the component expressions

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(config): ensure unparseable json fails

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(config): if no thresholds are set, use the default thresholds

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(policy): half implement thresholds

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(policy): continue wiring things up

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(lib): wire up thresholds

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(lib): handle behavior from legacy configurations

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: document thresholds

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: update CHANGELOG, refer to threshold configuration

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): fix build

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(lib): fix U1000

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Co-authored-by: Jason Cameron <git@jasoncameron.dev>
2025-06-18 16:58:31 -04:00
Dryusdan
1d5fa49eb0 Bump ai.robots.txt to v1.37 (#689)
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
2025-06-18 13:30:53 -04:00
Lothar Serra Mari
97c1d4f353 docs(known-instances): add extensions.typo3.org (#691)
Signed-off-by: Lothar Serra Mari <mail@serra.me>
2025-06-18 08:06:23 -04:00
hydrargyrum
244f1c505a fix(geo): correct typo "counties" to "countries" (#678) 2025-06-17 23:50:42 -04:00
Jason Cameron
ae4d3b0ce5 chore: remove duplicate CHANGELOG entry (#684)
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
2025-06-17 22:49:30 +00:00
prettysunflower
e60c43cdd2 docs(known-instances): add wiki.koha-community.org (#683)
Signed-off-by: prettysunflower <me@prettysunflower.moe>
2025-06-17 12:14:15 -04:00
Jason Cameron
b2b2679bae perf: replace cidranger with bart for significant performance improvements (#675)
* feat: replace cidranger with bart improving performance by 3-20x

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* perf: replace cidranger with bart for IP range checking

- Replace cidranger.Ranger with bart.Lite in RemoteAddrChecker
- Use netip.ParsePrefix instead of net.ParseCIDR for modern IP handling
- Improve performance: 3-20x faster lookups with zero heap allocations
- Update imports to use github.com/gaissmai/bart and net/netip
- Remove cidranger dependency from go.mod

Benchmark results:
- IPv4 lookups: 4x faster (15.58ns vs 63.25ns, 0 vs 2 allocs)
- IPv6 lookups: 3x faster (26.51ns vs 76.96ns, 0 vs 2 allocs)
- Insertions: 20x faster (976ns vs 19,191ns)
- Large tables: 14x faster (5.2ns vs 74.85ns)

* docs: clarify CHANGELOG to not give false impressions

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* perf: optimize string concatenation in RemoteAddrChecker hash generation

Replace fmt.Fprintln with strings.Join for 7x faster performance:
- Before: 935.1 ns/op, 784 B/op, 22 allocs/op
- After: 133.2 ns/op, 192 B/op, 1 alloc/op

The hash is used for JWT cookie validation and error code generation.
Comma separation provides the same deterministic uniqueness as newlines
but with significantly better performance during policy initialization.

* chore: remove accidentally commited string benchmark

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* style: apply Copilot suggestions

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* fix: reference the right var name

i cannot write a merge commit

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

---------

Signed-off-by: Jason Cameron <git@jasoncameron.dev>
2025-06-17 11:57:55 -04:00
Jason Cameron
e2b46fc5e7 perf: Replace internal SHA256 hashing with xxhash for 4-6x performance improvement (#676)
* perf(internal): Use FastHash for internal hashing
docs: Add xxhash performance improvement to changelog entry
feat(hash): Add fast non-cryptographic hash function

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* test(hash): add xxhash benchmarks and collision tests

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* Update metadata

check-spelling run (pull_request) for json/hash

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>

---------

Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
2025-06-16 22:53:53 -04:00
hyperdefined
3437e575d4 chore(sponsors): update canine.tools logo (#672) 2025-06-16 14:09:35 -04:00
Xe Iaso
ae064be710 chore(docs/manifest): it helps if you terminate strings properly
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-16 12:11:04 -04:00
Xe Iaso
e3826df3ab feat: implement a client for Thoth, the IP reputation database for Anubis (#637)
* feat(internal): add Thoth client and simple ASN checker

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(thoth): cached ip to asn checker

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: go mod tidy

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(thoth): minor testing fixups, ensure ASNChecker is Checker

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(thoth): make ASNChecker instances

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(thoth): add GeoIP checker

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(thoth): store a thoth client in a context

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: refactor Checker type to its own package

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(thoth): add thoth mocking package, ignore context deadline exceeded errors

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(thoth): pre-cache private ranges

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(lib/policy/config): enable thoth ASNs and GeoIP checker parsing

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(thoth): refactor to move checker creation to the checker files

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(policy): enable thoth checks

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(thothmock): test helper function for loading a mock thoth instance

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat: wire up Thoth, make thoth checks part of the default config

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(thoth): mend staticcheck errors

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(admin): add Thoth docs

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(policy): update Thoth links in error messages

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: update CHANGELOG

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(docs/manifest): enable Thoth

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: add THOTH_INSECURE for contacting Thoth over plain TCP in extreme circumstances

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(thoth): use mock thoth when credentials aren't detected in the environment

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(cmd/anubis): better warnings for half-configured Thoth setups

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(botpolicies): link to Thoth geoip docs

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-16 11:57:32 -04:00
Xe Iaso
823d1be5d1 chore(docs/manifest): explicitly allow blog RSS feed
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-16 11:50:53 -04:00
Xe Iaso
0c6a820372 chore(docs/manifest): enable OG_PASSTHROUGH
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-16 09:31:56 -04:00
Xe Iaso
81f6380dd4 Add the blog section back (#670)
* Revert "docs/blog: remove (#273)"

This reverts commit df3509ec99.

* chore: intro to the blog post

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-16 09:28:21 -04:00
dependabot[bot]
e5455c02d8 build(deps): bump the github-actions group with 3 updates (#666)
Bumps the github-actions group with 3 updates: [docker/login-action](https://github.com/docker/login-action), [actions/attest-build-provenance](https://github.com/actions/attest-build-provenance) and [github/codeql-action](https://github.com/github/codeql-action).


Updates `docker/login-action` from 3.0.0 to 3.4.0
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](https://github.com/docker/login-action/compare/v3...74a5d142397b4f367a81961eba4e8cd7edddf772)

Updates `actions/attest-build-provenance` from 2.3.0 to 2.4.0
- [Release notes](https://github.com/actions/attest-build-provenance/releases)
- [Changelog](https://github.com/actions/attest-build-provenance/blob/main/RELEASE.md)
- [Commits](db473fddc0...e8998f9491)

Updates `github/codeql-action` from 3.28.19 to 3.29.0
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](fca7ace96b...ce28f5bb42)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-version: 3.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-actions
- dependency-name: actions/attest-build-provenance
  dependency-version: 2.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-actions
- dependency-name: github/codeql-action
  dependency-version: 3.29.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jason Cameron <git@jasoncameron.dev>
2025-06-15 21:13:56 -04:00
Colin Finck
1d8033d69e Add ReactOS to known-instances.md (#664)
Signed-off-by: Colin Finck <colin@reactos.org>
2025-06-15 15:30:54 +00:00
Jason Cameron
e0781e4560 feat: add robots2policy CLI to convert robots.txt to Anubis CEL (#657)
* feat: add robots2policy CLI utility to convert robots.txt to Anubis challenge policies

* feat: add documentation for robots2policy CLI tool

* feat: implement crawl delay handling as weight adjustment in Anubis rules

* feat: add various robots.txt and YAML configurations for user agent handling and crawl delays

* test: add comprehensive tests for robots2policy conversion and parsing

* fix: update example URL in usage instructions for robots2policy CLI

* Update metadata

check-spelling run (pull_request) for json/robots2policycli

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>

* docs: add crawl delay weight adjustment and deny user agents option to robots2policy CLI

* Update cmd/robots2policy/main.go

Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>

* Update cmd/robots2policy/main.go

Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>

* fix(robots2policy): use sigs.k8s.io/yaml

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(config): properly marshal bot policy rules

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(yeetfile): expose robots2policy in libexec

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(yeetfile): put robots2policy in $PATH

Signed-off-by: Xe Iaso <me@xeiaso.net>

* Update metadata

check-spelling run (pull_request) for json/robots2policycli

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>

* style: reorder imports

* refactor: use preexisting structs in config

* fix: correct flag check in main function

* fix: reorder fields in AnubisRule struct for better alignment

* style: improve alignment of struct fields in AnubisRule and OGTagCache

* Update metadata

check-spelling run (pull_request) for json/robots2policycli

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>

* fix: add validation for generated Anubis rules from robots.txt

* feat: add batch processing for robots.txt files to generate Anubis CEL policies

* fix: improve usage message and error handling for input file requirement

* refactor: update AnubisRule structure to use ExpressionOrList for improved expression handling

* refactor: reorganize policy definitions in YAML files for consistency and clarity

* fix: correct indentation in blacklist and complex YAML files for consistency

* test: enhance output comparison in robots2policy tests for YAML and JSON formats

* Revert "fix: improve usage message and error handling for input file requirement"

This reverts commit ddcde1f2a3.

* fix: improve usage message and error handling in robots2policy

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

---------

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>
Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-06-14 23:41:00 -04:00
Lothar Serra Mari
7a195f1595 docs(known-instances): add bugs.scummvm.org and gitlab.postmarketos.org (#661)
* docs(known-instances): add bugs.scummvm.org and gitlab.postmarketos.org

Signed-off-by: Lothar Serra Mari <mail@serra.me>

* chore: clean uri

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

---------

Signed-off-by: Lothar Serra Mari <mail@serra.me>
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Co-authored-by: Jason Cameron <git@jasoncameron.dev>
2025-06-14 13:55:55 +00:00
Jason Cameron
2904ff974b refactor(ogtags): optimize URL construction and memory allocations (#647)
* refactor(ogtags): optimize URL construction and memory allocations

* test(ogtags): add benchmarks and memory usage tests for OGTagCache

* refactor(ogtags): optimize OGTags subsystem to reduce allocations and improve request runtime by up to 66%

* Update docs/docs/CHANGELOG.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>

* refactor(ogtags): optimize URL string construction to reduce allocations

* Update internal/ogtags/ogtags.go

Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>

* test(ogtags): add fuzz tests for getTarget and extractOGTags functions

* fix(ogtags): update memory calculation logic

Prev it would say that we had allocated 18pb

=== RUN   TestMemoryUsage
    mem_test.go:107: Memory allocated for 10k getTarget calls: 18014398509481904.00 KB
    mem_test.go:135: Memory allocated for 1k extractOGTags calls: 18014398509481978.00

    Now it's fixed with

    === RUN   TestMemoryUsage
    mem_test.go:109: Memory allocated for 10k getTarget calls:
    mem_test.go:110:   Total: 630.56 KB (0.62 MB)
    mem_test.go:111:   Per operation: 64.57 bytes
    mem_test.go:140: Memory allocated for 1k extractOGTags calls:
    mem_test.go:141:   Total: 328.17 KB (0.32 MB)
    mem_test.go:142:   Per operation: 336.05 bytes

* refactor(ogtags): optimize meta tag extraction for improved performance

* Update metadata

check-spelling run (pull_request) for json/ogmem

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>

* chore: update CHANGELOG for recent optimizations and version bump

* refactor: improve URL construction and meta tag extraction logic

* style:  cleanup fuzz tests

---------

Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-06-13 09:53:10 -04:00
Jason Cameron
3b3080d497 feat: add a strip-base-prefix option (#655)
* style: fix formatting in .air.toml and installation.mdx

* feat: add --strip-base-prefix flag to modify request paths when forwarding

Closes: #638

* refactor: apply structpacking (betteralign)

* fix: add validation for strip-base-prefix and base-prefix configuration

* fix: improve request path handling by cloning request and modifying URL path

* chore: remove integration tests as they are too annoying to debug on my system
2025-06-12 17:46:08 -04:00
Jason Cameron
60ba8e9557 fix(ci): conditionally run SSH jobs for TecharoHQ/anubis repository (#654)
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
2025-06-11 21:18:43 +00:00
Jason Cameron
14c80483a9 fix(gitattributes): update pattern for generated files (#652)
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
2025-06-11 21:00:37 +00:00
Xe Iaso
d1452b6d39 test(ssh-ci): re-enable GOARCH=ppc64le (#651)
This reverts commit 5e95da6b6c.
2025-06-11 14:01:48 -04:00
Xe Iaso
5e95da6b6c test(ssh-ci): disable GOARCH=ppc64le for now
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-11 12:58:32 -04:00
dependabot[bot]
988fc0941b build(deps): bump github.com/cloudflare/circl from 1.6.0 to 1.6.1 (#650)
Bumps [github.com/cloudflare/circl](https://github.com/cloudflare/circl) from 1.6.0 to 1.6.1.
- [Release notes](https://github.com/cloudflare/circl/releases)
- [Commits](https://github.com/cloudflare/circl/compare/v1.6.0...v1.6.1)

---
updated-dependencies:
- dependency-name: github.com/cloudflare/circl
  dependency-version: 1.6.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-11 16:52:10 +00:00
Xe Iaso
f5140ae57b test: introduce SSH based CI for non-native test hosts (#644)
* feat: ssh based CI

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test: implement SSH ci with caches and github actions

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(ssh-ci): fix known hosts secret

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(ssh-ci): clone the repo, that's important

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(ssh-ci): speed up ci by prebaking the SSH CI image

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(ssh-ci): set -euo

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(ssh-ci): enable pull_request_target so things work

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(ssh-ci): oh goody it's broken

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(ssh-ci): add cronjob to rebuild ci runner image

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(ssh-ci): also run yeet

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(ssh-ci): force git version for yeet

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(ssh-ci): run set -x in the container

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(ssh-ci): fix yeet?

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(ssh-ci): remove yeet for now

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(ssh-ci): disable for PRs for now

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-11 12:50:01 -04:00
Jason Cameron
bbdee34f37 fix(anubis): improve challenge handling and error reporting (#645) 2025-06-11 12:47:06 -04:00
Stephanie Gawroriski
6e2eeb9e65 Update known-instances.md to include SquirrelJME (#643)
Include SquirrelJME, which is an emulator for Java ME 8.

Signed-off-by: Stephanie Gawroriski <xer@multiphasicapps.net>
2025-06-10 23:20:50 +00:00
Xe Iaso
c638653172 feat(lib): implement request weight (#621)
* feat(lib): implement request weight

Replaces #608

This is a big one and will be what makes Anubis a generic web
application firewall. This introduces the WEIGH option, allowing
administrators to have facets of request metadata add or remove
"weight", or the level of suspicion. This really makes Anubis weigh
the soul of requests.

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): maintain legacy challenge behavior

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): make weight have dedicated checkers for the hashes

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(data): convert some rules over to weight points

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: document request weight

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(CHANGELOG): spelling error

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: fix links to challenge information

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(policies): fix formatting

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(config): make default weight adjustment 5

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-09 15:25:04 -04:00
Fierelier
0fe46b48cf Make progress bar styling more compatible (UXP, etc) (#636)
* Make progress bar styling more compatible (UXP, etc)

* Add 'Make progress bar styling more compatible (UXP, etc)'

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Fierelier <fier@airmail.cc>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-06-09 12:19:38 -04:00
David Chandek-Stark
d6e5561768 Adds ability to toggle off stripping of private addrs from XFF (#619)
* Adds ability to toggle off stripping of private addrs from XFF

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: refactor to flow better

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-06-09 13:33:19 +00:00
dependabot[bot]
6594ae0eef build(deps): bump github/codeql-action in the github-actions group (#635)
Bumps the github-actions group with 1 update: [github/codeql-action](https://github.com/github/codeql-action).


Updates `github/codeql-action` from 3.28.18 to 3.28.19
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](ff0a06e83c...fca7ace96b)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.28.19
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-09 08:56:41 -04:00
Lothar Serra Mari
ad09f82c3c docs(admin/environments): Prefer IPv6 over IPv4 for apache2 listener directive (#628)
Signed-off-by: Lothar Serra Mari <mail@serra.me>
2025-06-09 08:56:30 -04:00
Xe Iaso
372b797f64 chore: go generate
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-08 20:52:22 -04:00
dependabot[bot]
6eaf0e13a2 build(deps): bump the gomod group with 2 updates (#634)
Bumps the gomod group with 2 updates: [github.com/a-h/templ](https://github.com/a-h/templ) and [golang.org/x/net](https://github.com/golang/net).


Updates `github.com/a-h/templ` from 0.3.887 to 0.3.898
- [Release notes](https://github.com/a-h/templ/releases)
- [Changelog](https://github.com/a-h/templ/blob/main/.goreleaser.yaml)
- [Commits](https://github.com/a-h/templ/compare/v0.3.887...v0.3.898)

Updates `golang.org/x/net` from 0.40.0 to 0.41.0
- [Commits](https://github.com/golang/net/compare/v0.40.0...v0.41.0)

---
updated-dependencies:
- dependency-name: github.com/a-h/templ
  dependency-version: 0.3.898
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: gomod
- dependency-name: golang.org/x/net
  dependency-version: 0.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: gomod
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-08 20:44:25 -04:00
Dryusdan
281b6c5c00 Bump ai.robots.txt to v1.34 (#632) 2025-06-08 14:54:47 -04:00
Jason Cameron
9539668049 style: Some minor fixes (#548)
* chore(deps): update dependencies in go.mod and go.sum

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* refactor: rename variables for clarity in anubis.go and main.go

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* fix(checker): handle error when inserting IP range in ranger

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* fix(tests): simplify boolean checks in header and URL value tests

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* refactor(api): remove unused /test-error endpoint and restrict /make-challenge to development

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* build(deps): update golang-set to v2.8.0 in go.sum

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* Update metadata

check-spelling run (pull_request) for json/stuff

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>

---------

Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
2025-06-07 18:21:22 +00:00
Xe Iaso
8eff57fcb6 chore(docs/manifest): try no-js challenge to see how it impacts false positive rate
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-06 21:40:28 -04:00
Xe Iaso
4ac59c3a79 feat(lib/challenge): HTTP meta refresh challenge method (#623)
* feat(lib/challenge): HTTP meta refresh challenge method

Closes #95

This challenge method enables users that don't (or won't) support
JavaScript to pass Anubis challenges. It works by using HTML meta
refresh directives to ensure that the client is a browser.

This is OFF by default. In order to enable it, an administrator MUST
choose to make the default challenge method `metarefresh`.

TODO(Xe):

- [ ] Documentation on this challenge method
- [ ] Amend wording around Anubis being a proof of work proxy in the docs
- [ ] Add configuration file syntax for the default challenge method and settings
- [ ] Test with early customers

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib/challenge/metarefresh): use this value of err

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: add metarefresh challenge info, Web AI Firewall Utility

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-06 21:18:55 -04:00
Lothar Serra Mari
bee1c22b96 docs(known-instances): add wiki.dolphin-emu.org to known instances (#626)
Signed-off-by: Lothar Serra Mari <mail@serra.me>
2025-06-06 13:35:24 -04:00
Xe Iaso
5a7499ea3b fix(lib/challenge): allow challenges to register HTTP routes (#620)
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-06 00:26:23 +00:00
Jan Pieter Waagmeester
5f3861ab37 docs: Adjust the name of the cookie to the current "techaro.lol-anubis-auth" (#615)
* docs: Adjust the name of the cookie to the current "techaro.lol-anubis-auth"

Name definition:

76fa3e01a5/anubis.go (L12-L14)

The name changed in 6c0ff3f4d5


Signed-off-by: Jan Pieter Waagmeester <jieter@jieter.nl>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Jan Pieter Waagmeester <jieter@jieter.nl>
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-06-05 20:59:16 +00:00
foosinn
9f1d791991 docs(subrequest-auth): document required policy changes (#613)
* docs(subrequest-auth): document required policy changes

Signed-off-by: foosinn <foosinn@f2o.io>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: foosinn <foosinn@f2o.io>
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-06-05 16:53:18 -04:00
Markus Sommer
76fa3e01a5 docs(known-instances): add Alliance of Hessian Libraries (#611)
Signed-off-by: Markus Sommer <markus@splork.de>
2025-06-04 02:03:57 +00:00
Xe Iaso
f2db43ad4b feat: implement challenge registry (#607)
* feat: implement challenge method registry

This paves the way for implementing a no-js check method (#95) by making
the challenge providers more generic.

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib/challenge): rename proof-of-work package to proofofwork

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): make validated challenges a CounterVec

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): annotate jwts with challenge method

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(lib/challenge/proofofwork): implement tests

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(lib): add smoke tests for known good and known bad config files

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: update CHANGELOG

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): use challenge.Impl#Issue when issuing challenges

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-04 02:01:58 +00:00
Xe Iaso
ba4412c907 chore(sponsors): add Raptor Computing Systems
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-03 17:49:28 -04:00
Xe Iaso
f184cd81e7 docs(faq): anubis does not mine bitcoin (#609)
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-03 07:14:41 -04:00
Xe Iaso
59bfced8bf docs(admin/environments): update suggested HTTP headers
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-03 06:57:37 -04:00
Xe Iaso
780a935cb8 chore(sponsors): add wildbase
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-03 06:18:40 -04:00
Xe Iaso
f4bc1df797 chore(sponsors): add Uberspace
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-02 09:42:13 -04:00
dependabot[bot]
b496c90e86 build(deps): bump github.com/a-h/templ in the gomod group (#601)
Bumps the gomod group with 1 update: [github.com/a-h/templ](https://github.com/a-h/templ).


Updates `github.com/a-h/templ` from 0.3.865 to 0.3.887
- [Release notes](https://github.com/a-h/templ/releases)
- [Changelog](https://github.com/a-h/templ/blob/main/.goreleaser.yaml)
- [Commits](https://github.com/a-h/templ/compare/v0.3.865...v0.3.887)

---
updated-dependencies:
- dependency-name: github.com/a-h/templ
  dependency-version: 0.3.887
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: gomod
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-01 23:39:42 -04:00
dependabot[bot]
ec73bcbaf1 build(deps): bump docker/build-push-action in the github-actions group (#602)
Bumps the github-actions group with 1 update: [docker/build-push-action](https://github.com/docker/build-push-action).


Updates `docker/build-push-action` from 6.17.0 to 6.18.0
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](1dc7386353...263435318d)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-version: 6.18.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-01 23:39:05 -04:00
dependabot[bot]
8d19eed200 build(deps-dev): bump esbuild from 0.25.4 to 0.25.5 in the npm group (#600)
Bumps the npm group with 1 update: [esbuild](https://github.com/evanw/esbuild).


Updates `esbuild` from 0.25.4 to 0.25.5
- [Release notes](https://github.com/evanw/esbuild/releases)
- [Changelog](https://github.com/evanw/esbuild/blob/main/CHANGELOG.md)
- [Commits](https://github.com/evanw/esbuild/compare/v0.25.4...v0.25.5)

---
updated-dependencies:
- dependency-name: esbuild
  dependency-version: 0.25.5
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: npm
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-01 23:38:45 -04:00
Xe Iaso
ec733e93a5 v1.19.1
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-01 17:17:24 -04:00
Xe Iaso
51c384eefd fix(data/bots): bring back ai-robots-txt.yaml
Closes #599

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-01 17:15:00 -04:00
170 changed files with 6428 additions and 1019 deletions

View File

@@ -9,4 +9,4 @@ exclude_dir = ["var", "vendor", "docs", "node_modules"]
[logger]
time = true
# to change flags at runtime, prepend with -- e.g. $ air -- --target http://localhost:3000 --difficulty 20 --use-remote-address
# to change flags at runtime, prepend with -- e.g. $ air -- --target http://localhost:3000 --difficulty 20 --use-remote-address

2
.gitattributes vendored
View File

@@ -1 +1 @@
web/index_templ.go linguist-generated
**/*_templ.go linguist-generated=true

View File

@@ -2,4 +2,4 @@ github
https
ssh
ubuntu
workarounds
workarounds

View File

@@ -83,6 +83,9 @@
^\Q.github/FUNDING.yml\E$
^\Q.github/workflows/spelling.yml\E$
^data/crawlers/
^docs/blog/tags\.yml$
^docs/manifest/.*$
^docs/static/\.nojekyll$
^lib/policy/config/testdata/bad/unparseable\.json$
ignore$
robots.txt

View File

@@ -6,12 +6,18 @@ amazonbot
anthro
anubis
anubistest
apk
Applebot
archlinux
asnc
asnchecker
asns
aspirational
badregexes
bdba
berr
bingbot
Bitcoin
bitcoin
blogging
Bluesky
blueskybot
@@ -22,12 +28,15 @@ Brightbot
broked
Bytespider
cachebuster
cachediptoasn
Caddyfile
caninetools
Cardyb
celchecker
CELPHASE
cerr
certresolver
cespare
CGNAT
cgr
chainguard
@@ -35,7 +44,6 @@ chall
challengemozilla
checkpath
checkresult
chen
chibi
cidranger
ckie
@@ -46,12 +54,12 @@ coreutils
Cotoyogi
CRDs
crt
Cscript
daemonizing
DDOS
Debian
debrpm
decaymap
decompiling
Diffbot
discordapp
discordbot
@@ -84,27 +92,37 @@ Fordola
forgejo
fsys
fullchain
gaissmai
Galvus
geoip
geoipchecker
gha
gipc
gitea
godotenv
goland
gomod
goodbot
googlebot
govulncheck
goyaml
GPG
GPT
gptbot
grpcprom
grw
Hashcash
hashrate
headermap
healthcheck
hebis
hec
hmc
hostable
htmlc
htmx
httpdebug
Huawei
hypertext
iaskspider
iat
@@ -112,11 +130,14 @@ ifm
Imagesift
imgproxy
inp
IPTo
iptoasn
iss
isset
ivh
Jenomis
JGit
joho
journalctl
jshelter
JWTs
@@ -146,6 +167,7 @@ maintainership
malware
mcr
memes
metarefresh
metrix
mimi
minica
@@ -154,18 +176,21 @@ Mojeek
mojeekbot
mozilla
nbf
netsurf
nginx
nobots
NONINFRINGEMENT
nosleep
OCOB
ogtags
ogtitle
omgili
omgilibot
onionservice
openai
opengraph
openrc
pag
palemoon
Pangu
parseable
passthrough
@@ -182,12 +207,14 @@ prebaked
privkey
promauto
promhttp
proofofwork
pwcmd
pwuser
qualys
qwant
qwantbot
rac
rawler
rcvar
redir
redirectscheme
@@ -207,6 +234,7 @@ sebest
secretplans
selfsigned
Semrush
Seo
setsebool
shellcheck
Sidetrade
@@ -227,18 +255,24 @@ subrequest
SVCNAME
tagline
tarballs
tarrif
techaro
techarohq
templ
templruntime
testarea
thoth
thothmock
Tik
Timpibot
torproject
traefik
uberspace
unixhttpd
unmarshal
unparseable
uuidgen
uvx
UXP
Varis
Velen
vendored
@@ -251,6 +285,8 @@ webpage
websecure
websites
Webzio
wildbase
withthothmock
wordpress
Workaround
workdir
@@ -264,6 +300,7 @@ xess
xff
XForwarded
XNG
XOB
XReal
yae
YAMLTo

View File

@@ -273,14 +273,6 @@
# Most people only have two hands. Reword.
\b(?i)on the third hand\b
# Should be `Open Graph`
# unless talking about a specific Open Graph implementation:
# - Java
# - Node
# - Py
# - Ruby
\bOpenGraph\b
# Should be `OpenShift`
\bOpenshift\b

View File

@@ -131,4 +131,4 @@ go install(?:\s+[a-z]+\.[-@\w/.]+)+
# hit-count: 1 file-count: 1
# microsoft
\b(?:https?://|)(?:(?:(?:blogs|download\.visualstudio|docs|msdn2?|research)\.|)microsoft|blogs\.msdn)\.co(?:m|\.\w\w)/[-_a-zA-Z0-9()=./%]*
\b(?:https?://|)(?:(?:(?:blogs|download\.visualstudio|docs|msdn2?|research)\.|)microsoft|blogs\.msdn)\.co(?:m|\.\w\w)/[-_a-zA-Z0-9()=./%]*

View File

@@ -3,8 +3,8 @@ name: Docker image builds
on:
workflow_dispatch:
push:
branches: [ "main" ]
tags: [ "v*" ]
branches: ["main"]
tags: ["v*"]
env:
DOCKER_METADATA_SET_OUTPUT_ENV: "true"
@@ -55,7 +55,7 @@ jobs:
run: |
brew bundle
- name: Log into registry
- name: Log into registry
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
with:
registry: ghcr.io
@@ -77,9 +77,8 @@ jobs:
DOCKER_REPO: ${{ env.IMAGE }}
SLOG_LEVEL: debug
- name: Generate artifact attestation
uses: actions/attest-build-provenance@db473fddc028af60658334401dc6fa3ffd8669fd # v2.3.0
uses: actions/attest-build-provenance@e8998f949152b193b063cb0ec769d69d929409be # v2.4.0
with:
subject-name: ${{ env.IMAGE }}
subject-digest: ${{ steps.build.outputs.digest }}

View File

@@ -39,7 +39,7 @@ jobs:
- name: Build and push
id: build
uses: docker/build-push-action@1dc73863535b631f98b2378be8619f83b136f4a0 # v6.17.0
uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
with:
context: ./docs
cache-to: type=gha

View File

@@ -28,7 +28,7 @@ jobs:
- name: Build and push
id: build
uses: docker/build-push-action@1dc73863535b631f98b2378be8619f83b136f4a0 # v6.17.0
uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
with:
context: ./docs
cache-to: type=gha

View File

@@ -0,0 +1,37 @@
name: Regenerate ssh ci runner image
on:
# pull_request:
# branches: ["main"]
schedule:
- cron: "0 0 1,8,15,22 * *"
workflow_dispatch:
permissions:
pull-requests: write
contents: write
packages: write
jobs:
ssh-ci-rebuild:
if: github.repository == 'TecharoHQ/anubis'
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-tags: true
fetch-depth: 0
persist-credentials: false
- name: Log into registry
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0
- name: Build and push
run: |
cd ./test/ssh-ci
docker buildx bake --push

37
.github/workflows/ssh-ci.yml vendored Normal file
View File

@@ -0,0 +1,37 @@
name: SSH CI
on:
push:
branches: ["main"]
# pull_request:
# branches: ["main"]
permissions:
contents: read
jobs:
ssh:
if: github.repository == 'TecharoHQ/anubis'
runs-on: ubuntu-24.04
strategy:
matrix:
host:
- ubuntu@riscv64.techaro.lol
- ci@ppc64le.techaro.lol
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-tags: true
fetch-depth: 0
persist-credentials: false
- name: Install CI target SSH key
uses: shimataro/ssh-key-action@d4fffb50872869abe2d9a9098a6d9c5aa7d16be4 # v2.7.0
with:
key: ${{ secrets.CI_SSH_KEY }}
name: id_rsa
known_hosts: ${{ secrets.CI_SSH_KNOWN_HOSTS }}
- name: Run CI
run: bash test/ssh-ci/rigging.sh ${{ matrix.host }}
env:
GITHUB_RUN_ID: ${{ github.run_id }}

View File

@@ -29,7 +29,7 @@ jobs:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Upload SARIF file
uses: github/codeql-action/upload-sarif@ff0a06e83cb2de871e5a09832bc6a81e7276941f # v3.28.18
uses: github/codeql-action/upload-sarif@ce28f5bb42b7a9f2c824e633a3f6ee835bab6858 # v3.29.0
with:
sarif_file: results.sarif
category: zizmor

19
.vscode/settings.json vendored
View File

@@ -11,5 +11,24 @@
"zig": false,
"javascript": false,
"properties": false
},
"[markdown]": {
"editor.wordWrap": "wordWrapColumn",
"editor.wordWrapColumn": 80,
"editor.wordBasedSuggestions": "off"
},
"[mdx]": {
"editor.wordWrap": "wordWrapColumn",
"editor.wordWrapColumn": 80,
"editor.wordBasedSuggestions": "off"
},
"[nunjucks]": {
"editor.wordWrap": "wordWrapColumn",
"editor.wordWrapColumn": 80,
"editor.wordBasedSuggestions": "off"
},
"cSpell.enabledFileTypes": {
"mdx": true,
"md": true
}
}

View File

@@ -14,14 +14,36 @@
Anubis is brought to you by sponsors and donors like:
[![Distrust](./docs/static/img/sponsors/distrust-logo.webp)](https://distrust.co?utm_campaign=github&utm_medium=referral&utm_content=anubis)
[![Terminal Trove](./docs/static/img/sponsors/terminal-trove.webp)](https://terminaltrove.com/?utm_campaign=github&utm_medium=referral&utm_content=anubis&utm_source=abgh)
[![canine.tools](./docs/static/img/sponsors/caninetools-logo.webp)](https://canine.tools?utm_campaign=github&utm_medium=referral&utm_content=anubis)
[![Weblate](./docs/static/img/sponsors/weblate-logo.webp)](https://weblate.org/?utm_campaign=github&utm_medium=referral&utm_content=anubis)
### Diamond Tier
<a href="https://www.raptorcs.com/content/base/products.html">
<img src="./docs/static/img/sponsors/raptor-computing-logo.webp" alt="Raptor Computing Systems" height=64 />
</a>
### Gold Tier
<a href="https://distrust.co?utm_campaign=github&utm_medium=referral&utm_content=anubis">
<img src="./docs/static/img/sponsors/distrust-logo.webp" alt="Distrust" height="64">
</a>
<a href="https://terminaltrove.com/?utm_campaign=github&utm_medium=referral&utm_content=anubis&utm_source=abgh">
<img src="./docs/static/img/sponsors/terminal-trove.webp" alt="Terminal Trove" height="64">
</a>
<a href="https://canine.tools?utm_campaign=github&utm_medium=referral&utm_content=anubis">
<img src="./docs/static/img/sponsors/caninetools-logo.webp" alt="canine.tools" height="64">
</a>
<a href="https://weblate.org/">
<img src="./docs/static/img/sponsors/weblate-logo.webp" alt="Weblate" height="64">
</a>
<a href="https://uberspace.de/">
<img src="./docs/static/img/sponsors/uberspace-logo.webp" alt="Uberspace" height="64">
</a>
<a href="https://wildbase.xyz/">
<img src="./docs/static/img/sponsors/wildbase-logo.webp" alt="Wildbase" height="64">
</a>
## Overview
Anubis [weighs the soul of your connection](https://en.wikipedia.org/wiki/Weighing_of_souls) using a proof-of-work challenge in order to protect upstream resources from scraper bots.
Anubis is a Web AI Firewall Utility that [weighs the soul of your connection](https://en.wikipedia.org/wiki/Weighing_of_souls) using one or more challenges in order to protect upstream resources from scraper bots.
This program is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them.

View File

@@ -1 +1 @@
1.19.0
1.20.0-pre1

View File

@@ -30,11 +30,13 @@ import (
"github.com/TecharoHQ/anubis"
"github.com/TecharoHQ/anubis/data"
"github.com/TecharoHQ/anubis/internal"
"github.com/TecharoHQ/anubis/internal/thoth"
libanubis "github.com/TecharoHQ/anubis/lib"
botPolicy "github.com/TecharoHQ/anubis/lib/policy"
"github.com/TecharoHQ/anubis/lib/policy/config"
"github.com/TecharoHQ/anubis/web"
"github.com/facebookgo/flagenv"
_ "github.com/joho/godotenv/autoload"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
@@ -55,6 +57,7 @@ var (
policyFname = flag.String("policy-fname", "", "full path to anubis policy document (defaults to a sensible built-in policy)")
redirectDomains = flag.String("redirect-domains", "", "list of domains separated by commas which anubis is allowed to redirect to. Leaving this unset allows any domain.")
slogLevel = flag.String("slog-level", "INFO", "logging level (see https://pkg.go.dev/log/slog#hdr-Levels)")
stripBasePrefix = flag.Bool("strip-base-prefix", false, "if true, strips the base prefix from requests forwarded to the target server")
target = flag.String("target", "http://localhost:3923", "target to reverse proxy to, set to an empty string to disable proxying when only using auth request")
targetSNI = flag.String("target-sni", "", "if set, the value of the TLS handshake hostname when forwarding requests to the target")
targetHost = flag.String("target-host", "", "if set, the value of the Host header when forwarding requests to the target")
@@ -68,6 +71,11 @@ var (
extractResources = flag.String("extract-resources", "", "if set, extract the static resources to the specified folder")
webmasterEmail = flag.String("webmaster-email", "", "if set, displays webmaster's email on the reject page for appeals")
versionFlag = flag.Bool("version", false, "print Anubis version")
xffStripPrivate = flag.Bool("xff-strip-private", true, "if set, strip private addresses from X-Forwarded-For")
thothInsecure = flag.Bool("thoth-insecure", false, "if set, connect to Thoth over plain HTTP/2, don't enable this unless support told you to")
thothURL = flag.String("thoth-url", "", "if set, URL for Thoth, the IP reputation database for Anubis")
thothToken = flag.String("thoth-token", "", "if set, API token for Thoth, the IP reputation database for Anubis")
)
func keyFromHex(value string) (ed25519.PrivateKey, error) {
@@ -231,7 +239,25 @@ func main() {
}
}
policy, err := libanubis.LoadPoliciesOrDefault(*policyFname, *challengeDifficulty)
ctx := context.Background()
// Thoth configuration
switch {
case *thothURL != "" && *thothToken == "":
slog.Warn("THOTH_URL is set but no THOTH_TOKEN is set")
case *thothURL == "" && *thothToken != "":
slog.Warn("THOTH_TOKEN is set but no THOTH_URL is set")
case *thothURL != "" && *thothToken != "":
slog.Debug("connecting to Thoth")
thothClient, err := thoth.New(ctx, *thothURL, *thothToken, *thothInsecure)
if err != nil {
log.Fatalf("can't dial thoth at %s: %v", *thothURL, err)
}
ctx = thoth.With(ctx, thothClient)
}
policy, err := libanubis.LoadPoliciesOrDefault(ctx, *policyFname, *challengeDifficulty)
if err != nil {
log.Fatalf("can't parse policy file: %v", err)
}
@@ -259,6 +285,10 @@ func main() {
} else if strings.HasSuffix(*basePrefix, "/") {
log.Fatalf("[misconfiguration] base-prefix must not end with a slash")
}
if *stripBasePrefix && *basePrefix == "" {
log.Fatalf("[misconfiguration] strip-base-prefix is set to true, but base-prefix is not set, " +
"this may result in unexpected behavior")
}
var priv ed25519.PrivateKey
if *ed25519PrivateKeyHex != "" && *ed25519PrivateKeyHexFile != "" {
@@ -301,21 +331,28 @@ func main() {
slog.Warn("REDIRECT_DOMAINS is not set, Anubis will only redirect to the same domain a request is coming from, see https://anubis.techaro.lol/docs/admin/configuration/redirect-domains")
}
// If OpenGraph configuration values are not set in the config file, use the
// values from flags / envvars.
if !policy.OpenGraph.Enabled {
policy.OpenGraph.Enabled = *ogPassthrough
policy.OpenGraph.ConsiderHost = *ogCacheConsiderHost
policy.OpenGraph.TimeToLive = *ogTimeToLive
policy.OpenGraph.Override = map[string]string{}
}
s, err := libanubis.New(libanubis.Options{
BasePrefix: *basePrefix,
Next: rp,
Policy: policy,
ServeRobotsTXT: *robotsTxt,
PrivateKey: priv,
CookieDomain: *cookieDomain,
CookieExpiration: *cookieExpiration,
CookiePartitioned: *cookiePartitioned,
OGPassthrough: *ogPassthrough,
OGTimeToLive: *ogTimeToLive,
RedirectDomains: redirectDomainsList,
Target: *target,
WebmasterEmail: *webmasterEmail,
OGCacheConsidersHost: *ogCacheConsiderHost,
BasePrefix: *basePrefix,
StripBasePrefix: *stripBasePrefix,
Next: rp,
Policy: policy,
ServeRobotsTXT: *robotsTxt,
PrivateKey: priv,
CookieDomain: *cookieDomain,
CookieExpiration: *cookieExpiration,
CookiePartitioned: *cookiePartitioned,
RedirectDomains: redirectDomainsList,
Target: *target,
WebmasterEmail: *webmasterEmail,
})
if err != nil {
log.Fatalf("can't construct libanubis.Server: %v", err)
@@ -336,7 +373,7 @@ func main() {
h = s
h = internal.RemoteXRealIP(*useRemoteAddress, *bindNetwork, h)
h = internal.XForwardedForToXRealIP(h)
h = internal.XForwardedForUpdate(h)
h = internal.XForwardedForUpdate(*xffStripPrivate, h)
srv := http.Server{Handler: h, ErrorLog: internal.GetFilteredHTTPLogger()}
listener, listenerUrl := setupListener(*bindNetwork, *bind)
@@ -420,11 +457,11 @@ func extractEmbedFS(fsys embed.FS, root string, destDir string) error {
return os.MkdirAll(destPath, 0o700)
}
data, err := fs.ReadFile(fsys, path)
embeddedData, err := fs.ReadFile(fsys, path)
if err != nil {
return err
}
return os.WriteFile(destPath, data, 0o644)
return os.WriteFile(destPath, embeddedData, 0o644)
})
}

View File

@@ -0,0 +1,78 @@
/*
Batch process robots.txt files from archives like https://github.com/nrjones8/robots-dot-txt-archive-bot/tree/master/data/cleaned
into Anubis CEL policies. Usage: go run batch_process.go <directory with robots.txt files>
*/
package main
import (
"fmt"
"io/fs"
"log"
"os"
"os/exec"
"path/filepath"
"strings"
)
func main() {
if len(os.Args) < 2 {
fmt.Println("Usage: go run batch_process.go <cleaned_directory>")
fmt.Println("Example: go run batch_process.go ./cleaned")
os.Exit(1)
}
cleanedDir := os.Args[1]
outputDir := "generated_policies"
// Create output directory
if err := os.MkdirAll(outputDir, 0755); err != nil {
log.Fatalf("Failed to create output directory: %v", err)
}
count := 0
err := filepath.WalkDir(cleanedDir, func(path string, d fs.DirEntry, err error) error {
if err != nil {
return err
}
// Skip directories
if d.IsDir() {
return nil
}
// Generate policy name from file path
relPath, _ := filepath.Rel(cleanedDir, path)
policyName := strings.ReplaceAll(relPath, "/", "-")
policyName = strings.TrimSuffix(policyName, "-robots.txt")
policyName = strings.ReplaceAll(policyName, ".", "-")
outputFile := filepath.Join(outputDir, policyName+".yaml")
cmd := exec.Command("go", "run", "main.go",
"-input", path,
"-output", outputFile,
"-name", policyName,
"-format", "yaml")
if err := cmd.Run(); err != nil {
fmt.Printf("Warning: Failed to process %s: %v\n", path, err)
return nil // Continue processing other files
}
count++
if count%100 == 0 {
fmt.Printf("Processed %d files...\n", count)
} else if count%10 == 0 {
fmt.Print(".")
}
return nil
})
if err != nil {
log.Fatalf("Error walking directory: %v", err)
}
fmt.Printf("Successfully processed %d robots.txt files\n", count)
fmt.Printf("Generated policies saved to: %s/\n", outputDir)
}

313
cmd/robots2policy/main.go Normal file
View File

@@ -0,0 +1,313 @@
package main
import (
"bufio"
"encoding/json"
"flag"
"fmt"
"io"
"log"
"net/http"
"os"
"regexp"
"strings"
"github.com/TecharoHQ/anubis/lib/policy/config"
"sigs.k8s.io/yaml"
)
var (
inputFile = flag.String("input", "", "path to robots.txt file (use - for stdin)")
outputFile = flag.String("output", "", "output file path (use - for stdout, defaults to stdout)")
outputFormat = flag.String("format", "yaml", "output format: yaml or json")
baseAction = flag.String("action", "CHALLENGE", "default action for disallowed paths: ALLOW, DENY, CHALLENGE, WEIGH")
crawlDelay = flag.Int("crawl-delay-weight", 0, "if > 0, add weight adjustment for crawl-delay (difficulty adjustment)")
policyName = flag.String("name", "robots-txt-policy", "name for the generated policy")
userAgentDeny = flag.String("deny-user-agents", "DENY", "action for specifically blocked user agents: DENY, CHALLENGE")
helpFlag = flag.Bool("help", false, "show help")
)
type RobotsRule struct {
UserAgent string
Disallows []string
Allows []string
CrawlDelay int
IsBlacklist bool // true if this is a specifically denied user agent
}
type AnubisRule struct {
Expression *config.ExpressionOrList `yaml:"expression,omitempty" json:"expression,omitempty"`
Challenge *config.ChallengeRules `yaml:"challenge,omitempty" json:"challenge,omitempty"`
Weight *config.Weight `yaml:"weight,omitempty" json:"weight,omitempty"`
Name string `yaml:"name" json:"name"`
Action string `yaml:"action" json:"action"`
}
func init() {
flag.Usage = func() {
fmt.Fprintf(os.Stderr, "Usage of %s:\n", os.Args[0])
fmt.Fprintf(os.Stderr, "%s [options] -input <robots.txt>\n\n", os.Args[0])
flag.PrintDefaults()
fmt.Fprintln(os.Stderr, "\nExamples:")
fmt.Fprintln(os.Stderr, " # Convert local robots.txt file")
fmt.Fprintln(os.Stderr, " robots2policy -input robots.txt -output policy.yaml")
fmt.Fprintln(os.Stderr, "")
fmt.Fprintln(os.Stderr, " # Convert from URL")
fmt.Fprintln(os.Stderr, " robots2policy -input https://example.com/robots.txt -format json")
fmt.Fprintln(os.Stderr, "")
fmt.Fprintln(os.Stderr, " # Read from stdin, write to stdout")
fmt.Fprintln(os.Stderr, " curl https://example.com/robots.txt | robots2policy -input -")
os.Exit(2)
}
}
func main() {
flag.Parse()
if len(flag.Args()) > 0 || *helpFlag || *inputFile == "" {
flag.Usage()
}
// Read robots.txt
var input io.Reader
if *inputFile == "-" {
input = os.Stdin
} else if strings.HasPrefix(*inputFile, "http://") || strings.HasPrefix(*inputFile, "https://") {
resp, err := http.Get(*inputFile)
if err != nil {
log.Fatalf("failed to fetch robots.txt from URL: %v", err)
}
defer resp.Body.Close()
input = resp.Body
} else {
file, err := os.Open(*inputFile)
if err != nil {
log.Fatalf("failed to open input file: %v", err)
}
defer file.Close()
input = file
}
// Parse robots.txt
rules, err := parseRobotsTxt(input)
if err != nil {
log.Fatalf("failed to parse robots.txt: %v", err)
}
// Convert to Anubis rules
anubisRules := convertToAnubisRules(rules)
// Check if any rules were generated
if len(anubisRules) == 0 {
log.Fatal("no valid rules generated from robots.txt - file may be empty or contain no disallow directives")
}
// Generate output
var output []byte
switch strings.ToLower(*outputFormat) {
case "yaml":
output, err = yaml.Marshal(anubisRules)
case "json":
output, err = json.MarshalIndent(anubisRules, "", " ")
default:
log.Fatalf("unsupported output format: %s (use yaml or json)", *outputFormat)
}
if err != nil {
log.Fatalf("failed to marshal output: %v", err)
}
// Write output
if *outputFile == "" || *outputFile == "-" {
fmt.Print(string(output))
} else {
err = os.WriteFile(*outputFile, output, 0644)
if err != nil {
log.Fatalf("failed to write output file: %v", err)
}
fmt.Printf("Generated Anubis policy written to %s\n", *outputFile)
}
}
func parseRobotsTxt(input io.Reader) ([]RobotsRule, error) {
scanner := bufio.NewScanner(input)
var rules []RobotsRule
var currentRule *RobotsRule
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
// Skip empty lines and comments
if line == "" || strings.HasPrefix(line, "#") {
continue
}
// Split on first colon
parts := strings.SplitN(line, ":", 2)
if len(parts) != 2 {
continue
}
directive := strings.TrimSpace(strings.ToLower(parts[0]))
value := strings.TrimSpace(parts[1])
switch directive {
case "user-agent":
// Start a new rule section
if currentRule != nil {
rules = append(rules, *currentRule)
}
currentRule = &RobotsRule{
UserAgent: value,
Disallows: make([]string, 0),
Allows: make([]string, 0),
}
case "disallow":
if currentRule != nil && value != "" {
currentRule.Disallows = append(currentRule.Disallows, value)
}
case "allow":
if currentRule != nil && value != "" {
currentRule.Allows = append(currentRule.Allows, value)
}
case "crawl-delay":
if currentRule != nil {
if delay, err := parseIntSafe(value); err == nil {
currentRule.CrawlDelay = delay
}
}
}
}
// Don't forget the last rule
if currentRule != nil {
rules = append(rules, *currentRule)
}
// Mark blacklisted user agents (those with "Disallow: /")
for i := range rules {
for _, disallow := range rules[i].Disallows {
if disallow == "/" {
rules[i].IsBlacklist = true
break
}
}
}
return rules, scanner.Err()
}
func parseIntSafe(s string) (int, error) {
var result int
_, err := fmt.Sscanf(s, "%d", &result)
return result, err
}
func convertToAnubisRules(robotsRules []RobotsRule) []AnubisRule {
var anubisRules []AnubisRule
ruleCounter := 0
for _, robotsRule := range robotsRules {
userAgent := robotsRule.UserAgent
// Handle crawl delay as weight adjustment (do this first before any continues)
if robotsRule.CrawlDelay > 0 && *crawlDelay > 0 {
ruleCounter++
rule := AnubisRule{
Name: fmt.Sprintf("%s-crawl-delay-%d", *policyName, ruleCounter),
Action: "WEIGH",
Weight: &config.Weight{Adjust: *crawlDelay},
}
if userAgent == "*" {
rule.Expression = &config.ExpressionOrList{
All: []string{"true"}, // Always applies
}
} else {
rule.Expression = &config.ExpressionOrList{
All: []string{fmt.Sprintf("userAgent.contains(%q)", userAgent)},
}
}
anubisRules = append(anubisRules, rule)
}
// Handle blacklisted user agents (complete deny/challenge)
if robotsRule.IsBlacklist {
ruleCounter++
rule := AnubisRule{
Name: fmt.Sprintf("%s-blacklist-%d", *policyName, ruleCounter),
Action: *userAgentDeny,
}
if userAgent == "*" {
// This would block everything - convert to a weight adjustment instead
rule.Name = fmt.Sprintf("%s-global-restriction-%d", *policyName, ruleCounter)
rule.Action = "WEIGH"
rule.Weight = &config.Weight{Adjust: 20} // Increase difficulty significantly
rule.Expression = &config.ExpressionOrList{
All: []string{"true"}, // Always applies
}
} else {
rule.Expression = &config.ExpressionOrList{
All: []string{fmt.Sprintf("userAgent.contains(%q)", userAgent)},
}
}
anubisRules = append(anubisRules, rule)
continue
}
// Handle specific disallow rules
for _, disallow := range robotsRule.Disallows {
if disallow == "/" {
continue // Already handled as blacklist above
}
ruleCounter++
rule := AnubisRule{
Name: fmt.Sprintf("%s-disallow-%d", *policyName, ruleCounter),
Action: *baseAction,
}
// Build CEL expression
var conditions []string
// Add user agent condition if not wildcard
if userAgent != "*" {
conditions = append(conditions, fmt.Sprintf("userAgent.contains(%q)", userAgent))
}
// Add path condition
pathCondition := buildPathCondition(disallow)
conditions = append(conditions, pathCondition)
rule.Expression = &config.ExpressionOrList{
All: conditions,
}
anubisRules = append(anubisRules, rule)
}
}
return anubisRules
}
func buildPathCondition(robotsPath string) string {
// Handle wildcards in robots.txt paths
if strings.Contains(robotsPath, "*") || strings.Contains(robotsPath, "?") {
// Convert robots.txt wildcards to regex
regex := regexp.QuoteMeta(robotsPath)
regex = strings.ReplaceAll(regex, `\*`, `.*`) // * becomes .*
regex = strings.ReplaceAll(regex, `\?`, `.`) // ? becomes .
regex = "^" + regex
return fmt.Sprintf("path.matches(%q)", regex)
}
// Simple prefix match for most cases
return fmt.Sprintf("path.startsWith(%q)", robotsPath)
}

View File

@@ -0,0 +1,418 @@
package main
import (
"encoding/json"
"fmt"
"os"
"path/filepath"
"reflect"
"strings"
"testing"
"gopkg.in/yaml.v3"
)
type TestCase struct {
name string
robotsFile string
expectedFile string
options TestOptions
}
type TestOptions struct {
format string
action string
crawlDelayWeight int
policyName string
deniedAction string
}
func TestDataFileConversion(t *testing.T) {
testCases := []TestCase{
{
name: "simple_default",
robotsFile: "simple.robots.txt",
expectedFile: "simple.yaml",
options: TestOptions{format: "yaml"},
},
{
name: "simple_json",
robotsFile: "simple.robots.txt",
expectedFile: "simple.json",
options: TestOptions{format: "json"},
},
{
name: "simple_deny_action",
robotsFile: "simple.robots.txt",
expectedFile: "deny-action.yaml",
options: TestOptions{format: "yaml", action: "DENY"},
},
{
name: "simple_custom_name",
robotsFile: "simple.robots.txt",
expectedFile: "custom-name.yaml",
options: TestOptions{format: "yaml", policyName: "my-custom-policy"},
},
{
name: "blacklist_with_crawl_delay",
robotsFile: "blacklist.robots.txt",
expectedFile: "blacklist.yaml",
options: TestOptions{format: "yaml", crawlDelayWeight: 3},
},
{
name: "wildcards",
robotsFile: "wildcards.robots.txt",
expectedFile: "wildcards.yaml",
options: TestOptions{format: "yaml"},
},
{
name: "empty_file",
robotsFile: "empty.robots.txt",
expectedFile: "empty.yaml",
options: TestOptions{format: "yaml"},
},
{
name: "complex_scenario",
robotsFile: "complex.robots.txt",
expectedFile: "complex.yaml",
options: TestOptions{format: "yaml", crawlDelayWeight: 5},
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
robotsPath := filepath.Join("testdata", tc.robotsFile)
expectedPath := filepath.Join("testdata", tc.expectedFile)
// Read robots.txt input
robotsFile, err := os.Open(robotsPath)
if err != nil {
t.Fatalf("Failed to open robots file %s: %v", robotsPath, err)
}
defer robotsFile.Close()
// Parse robots.txt
rules, err := parseRobotsTxt(robotsFile)
if err != nil {
t.Fatalf("Failed to parse robots.txt: %v", err)
}
// Set test options
oldFormat := *outputFormat
oldAction := *baseAction
oldCrawlDelay := *crawlDelay
oldPolicyName := *policyName
oldDeniedAction := *userAgentDeny
if tc.options.format != "" {
*outputFormat = tc.options.format
}
if tc.options.action != "" {
*baseAction = tc.options.action
}
if tc.options.crawlDelayWeight > 0 {
*crawlDelay = tc.options.crawlDelayWeight
}
if tc.options.policyName != "" {
*policyName = tc.options.policyName
}
if tc.options.deniedAction != "" {
*userAgentDeny = tc.options.deniedAction
}
// Restore options after test
defer func() {
*outputFormat = oldFormat
*baseAction = oldAction
*crawlDelay = oldCrawlDelay
*policyName = oldPolicyName
*userAgentDeny = oldDeniedAction
}()
// Convert to Anubis rules
anubisRules := convertToAnubisRules(rules)
// Generate output
var actualOutput []byte
switch strings.ToLower(*outputFormat) {
case "yaml":
actualOutput, err = yaml.Marshal(anubisRules)
case "json":
actualOutput, err = json.MarshalIndent(anubisRules, "", " ")
}
if err != nil {
t.Fatalf("Failed to marshal output: %v", err)
}
// Read expected output
expectedOutput, err := os.ReadFile(expectedPath)
if err != nil {
t.Fatalf("Failed to read expected file %s: %v", expectedPath, err)
}
if strings.ToLower(*outputFormat) == "yaml" {
var actualData []interface{}
var expectedData []interface{}
err = yaml.Unmarshal(actualOutput, &actualData)
if err != nil {
t.Fatalf("Failed to unmarshal actual output: %v", err)
}
err = yaml.Unmarshal(expectedOutput, &expectedData)
if err != nil {
t.Fatalf("Failed to unmarshal expected output: %v", err)
}
// Compare data structures
if !compareData(actualData, expectedData) {
actualStr := strings.TrimSpace(string(actualOutput))
expectedStr := strings.TrimSpace(string(expectedOutput))
t.Errorf("Output mismatch for %s\nExpected:\n%s\n\nActual:\n%s", tc.name, expectedStr, actualStr)
}
} else {
var actualData []interface{}
var expectedData []interface{}
err = json.Unmarshal(actualOutput, &actualData)
if err != nil {
t.Fatalf("Failed to unmarshal actual JSON output: %v", err)
}
err = json.Unmarshal(expectedOutput, &expectedData)
if err != nil {
t.Fatalf("Failed to unmarshal expected JSON output: %v", err)
}
// Compare data structures
if !compareData(actualData, expectedData) {
actualStr := strings.TrimSpace(string(actualOutput))
expectedStr := strings.TrimSpace(string(expectedOutput))
t.Errorf("Output mismatch for %s\nExpected:\n%s\n\nActual:\n%s", tc.name, expectedStr, actualStr)
}
}
})
}
}
func TestCaseInsensitiveParsing(t *testing.T) {
robotsTxt := `User-Agent: *
Disallow: /admin
Crawl-Delay: 10
User-agent: TestBot
disallow: /test
crawl-delay: 5
USER-AGENT: UpperBot
DISALLOW: /upper
CRAWL-DELAY: 20`
reader := strings.NewReader(robotsTxt)
rules, err := parseRobotsTxt(reader)
if err != nil {
t.Fatalf("Failed to parse case-insensitive robots.txt: %v", err)
}
expectedRules := 3
if len(rules) != expectedRules {
t.Errorf("Expected %d rules, got %d", expectedRules, len(rules))
}
// Check that all crawl delays were parsed
for i, rule := range rules {
expectedDelays := []int{10, 5, 20}
if rule.CrawlDelay != expectedDelays[i] {
t.Errorf("Rule %d: expected crawl delay %d, got %d", i, expectedDelays[i], rule.CrawlDelay)
}
}
}
func TestVariousOutputFormats(t *testing.T) {
robotsTxt := `User-agent: *
Disallow: /admin`
reader := strings.NewReader(robotsTxt)
rules, err := parseRobotsTxt(reader)
if err != nil {
t.Fatalf("Failed to parse robots.txt: %v", err)
}
oldPolicyName := *policyName
*policyName = "test-policy"
defer func() { *policyName = oldPolicyName }()
anubisRules := convertToAnubisRules(rules)
// Test YAML output
yamlOutput, err := yaml.Marshal(anubisRules)
if err != nil {
t.Fatalf("Failed to marshal YAML: %v", err)
}
if !strings.Contains(string(yamlOutput), "name: test-policy-disallow-1") {
t.Errorf("YAML output doesn't contain expected rule name")
}
// Test JSON output
jsonOutput, err := json.MarshalIndent(anubisRules, "", " ")
if err != nil {
t.Fatalf("Failed to marshal JSON: %v", err)
}
if !strings.Contains(string(jsonOutput), `"name": "test-policy-disallow-1"`) {
t.Errorf("JSON output doesn't contain expected rule name")
}
}
func TestDifferentActions(t *testing.T) {
robotsTxt := `User-agent: *
Disallow: /admin`
testActions := []string{"ALLOW", "DENY", "CHALLENGE", "WEIGH"}
for _, action := range testActions {
t.Run("action_"+action, func(t *testing.T) {
reader := strings.NewReader(robotsTxt)
rules, err := parseRobotsTxt(reader)
if err != nil {
t.Fatalf("Failed to parse robots.txt: %v", err)
}
oldAction := *baseAction
*baseAction = action
defer func() { *baseAction = oldAction }()
anubisRules := convertToAnubisRules(rules)
if len(anubisRules) != 1 {
t.Fatalf("Expected 1 rule, got %d", len(anubisRules))
}
if anubisRules[0].Action != action {
t.Errorf("Expected action %s, got %s", action, anubisRules[0].Action)
}
})
}
}
func TestPolicyNaming(t *testing.T) {
robotsTxt := `User-agent: *
Disallow: /admin
Disallow: /private
User-agent: BadBot
Disallow: /`
testNames := []string{"custom-policy", "my-rules", "site-protection"}
for _, name := range testNames {
t.Run("name_"+name, func(t *testing.T) {
reader := strings.NewReader(robotsTxt)
rules, err := parseRobotsTxt(reader)
if err != nil {
t.Fatalf("Failed to parse robots.txt: %v", err)
}
oldName := *policyName
*policyName = name
defer func() { *policyName = oldName }()
anubisRules := convertToAnubisRules(rules)
// Check that all rule names use the custom prefix
for _, rule := range anubisRules {
if !strings.HasPrefix(rule.Name, name+"-") {
t.Errorf("Rule name %s doesn't start with expected prefix %s-", rule.Name, name)
}
}
})
}
}
func TestCrawlDelayWeights(t *testing.T) {
robotsTxt := `User-agent: *
Disallow: /admin
Crawl-delay: 10
User-agent: SlowBot
Disallow: /slow
Crawl-delay: 60`
testWeights := []int{1, 5, 10, 25}
for _, weight := range testWeights {
t.Run(fmt.Sprintf("weight_%d", weight), func(t *testing.T) {
reader := strings.NewReader(robotsTxt)
rules, err := parseRobotsTxt(reader)
if err != nil {
t.Fatalf("Failed to parse robots.txt: %v", err)
}
oldWeight := *crawlDelay
*crawlDelay = weight
defer func() { *crawlDelay = oldWeight }()
anubisRules := convertToAnubisRules(rules)
// Count weight rules and verify they have correct weight
weightRules := 0
for _, rule := range anubisRules {
if rule.Action == "WEIGH" && rule.Weight != nil {
weightRules++
if rule.Weight.Adjust != weight {
t.Errorf("Expected weight %d, got %d", weight, rule.Weight.Adjust)
}
}
}
expectedWeightRules := 2 // One for *, one for SlowBot
if weightRules != expectedWeightRules {
t.Errorf("Expected %d weight rules, got %d", expectedWeightRules, weightRules)
}
})
}
}
func TestBlacklistActions(t *testing.T) {
robotsTxt := `User-agent: BadBot
Disallow: /
User-agent: SpamBot
Disallow: /`
testActions := []string{"DENY", "CHALLENGE"}
for _, action := range testActions {
t.Run("blacklist_"+action, func(t *testing.T) {
reader := strings.NewReader(robotsTxt)
rules, err := parseRobotsTxt(reader)
if err != nil {
t.Fatalf("Failed to parse robots.txt: %v", err)
}
oldAction := *userAgentDeny
*userAgentDeny = action
defer func() { *userAgentDeny = oldAction }()
anubisRules := convertToAnubisRules(rules)
// All rules should be blacklist rules with the specified action
for _, rule := range anubisRules {
if !strings.Contains(rule.Name, "blacklist") {
t.Errorf("Expected blacklist rule, got %s", rule.Name)
}
if rule.Action != action {
t.Errorf("Expected action %s, got %s", action, rule.Action)
}
}
})
}
}
// compareData performs a deep comparison of two data structures,
// ignoring differences that are semantically equivalent in YAML/JSON
func compareData(actual, expected interface{}) bool {
return reflect.DeepEqual(actual, expected)
}

View File

@@ -0,0 +1,15 @@
# Test with blacklisted user agents
User-agent: *
Disallow: /admin
Crawl-delay: 10
User-agent: BadBot
Disallow: /
User-agent: SpamBot
Disallow: /
Crawl-delay: 60
User-agent: Googlebot
Disallow: /search
Crawl-delay: 5

View File

@@ -0,0 +1,30 @@
- action: WEIGH
expression: "true"
name: robots-txt-policy-crawl-delay-1
weight:
adjust: 3
- action: CHALLENGE
expression: path.startsWith("/admin")
name: robots-txt-policy-disallow-2
- action: DENY
expression: userAgent.contains("BadBot")
name: robots-txt-policy-blacklist-3
- action: WEIGH
expression: userAgent.contains("SpamBot")
name: robots-txt-policy-crawl-delay-4
weight:
adjust: 3
- action: DENY
expression: userAgent.contains("SpamBot")
name: robots-txt-policy-blacklist-5
- action: WEIGH
expression: userAgent.contains("Googlebot")
name: robots-txt-policy-crawl-delay-6
weight:
adjust: 3
- action: CHALLENGE
expression:
all:
- userAgent.contains("Googlebot")
- path.startsWith("/search")
name: robots-txt-policy-disallow-7

View File

@@ -0,0 +1,30 @@
# Complex real-world example
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /api/internal/
Allow: /api/public/
Crawl-delay: 5
User-agent: Googlebot
Disallow: /search/
Allow: /api/
Crawl-delay: 2
User-agent: Bingbot
Disallow: /search/
Disallow: /admin/
Crawl-delay: 10
User-agent: BadBot
Disallow: /
User-agent: SeoBot
Disallow: /
Crawl-delay: 300
# Test with various patterns
User-agent: TestBot
Disallow: /*/admin
Disallow: /temp*.html
Disallow: /file?.log

71
cmd/robots2policy/testdata/complex.yaml vendored Normal file
View File

@@ -0,0 +1,71 @@
- action: WEIGH
expression: "true"
name: robots-txt-policy-crawl-delay-1
weight:
adjust: 5
- action: CHALLENGE
expression: path.startsWith("/admin/")
name: robots-txt-policy-disallow-2
- action: CHALLENGE
expression: path.startsWith("/private/")
name: robots-txt-policy-disallow-3
- action: CHALLENGE
expression: path.startsWith("/api/internal/")
name: robots-txt-policy-disallow-4
- action: WEIGH
expression: userAgent.contains("Googlebot")
name: robots-txt-policy-crawl-delay-5
weight:
adjust: 5
- action: CHALLENGE
expression:
all:
- userAgent.contains("Googlebot")
- path.startsWith("/search/")
name: robots-txt-policy-disallow-6
- action: WEIGH
expression: userAgent.contains("Bingbot")
name: robots-txt-policy-crawl-delay-7
weight:
adjust: 5
- action: CHALLENGE
expression:
all:
- userAgent.contains("Bingbot")
- path.startsWith("/search/")
name: robots-txt-policy-disallow-8
- action: CHALLENGE
expression:
all:
- userAgent.contains("Bingbot")
- path.startsWith("/admin/")
name: robots-txt-policy-disallow-9
- action: DENY
expression: userAgent.contains("BadBot")
name: robots-txt-policy-blacklist-10
- action: WEIGH
expression: userAgent.contains("SeoBot")
name: robots-txt-policy-crawl-delay-11
weight:
adjust: 5
- action: DENY
expression: userAgent.contains("SeoBot")
name: robots-txt-policy-blacklist-12
- action: CHALLENGE
expression:
all:
- userAgent.contains("TestBot")
- path.matches("^/.*/admin")
name: robots-txt-policy-disallow-13
- action: CHALLENGE
expression:
all:
- userAgent.contains("TestBot")
- path.matches("^/temp.*\\.html")
name: robots-txt-policy-disallow-14
- action: CHALLENGE
expression:
all:
- userAgent.contains("TestBot")
- path.matches("^/file.\\.log")
name: robots-txt-policy-disallow-15

View File

@@ -0,0 +1,6 @@
- action: CHALLENGE
expression: path.startsWith("/admin/")
name: my-custom-policy-disallow-1
- action: CHALLENGE
expression: path.startsWith("/private")
name: my-custom-policy-disallow-2

View File

@@ -0,0 +1,6 @@
- action: DENY
expression: path.startsWith("/admin/")
name: robots-txt-policy-disallow-1
- action: DENY
expression: path.startsWith("/private")
name: robots-txt-policy-disallow-2

View File

@@ -0,0 +1,2 @@
# Empty robots.txt (comments only)
# No actual rules

1
cmd/robots2policy/testdata/empty.yaml vendored Normal file
View File

@@ -0,0 +1 @@
[]

12
cmd/robots2policy/testdata/simple.json vendored Normal file
View File

@@ -0,0 +1,12 @@
[
{
"action": "CHALLENGE",
"expression": "path.startsWith(\"/admin/\")",
"name": "robots-txt-policy-disallow-1"
},
{
"action": "CHALLENGE",
"expression": "path.startsWith(\"/private\")",
"name": "robots-txt-policy-disallow-2"
}
]

View File

@@ -0,0 +1,5 @@
# Simple robots.txt test
User-agent: *
Disallow: /admin/
Disallow: /private
Allow: /public

View File

@@ -0,0 +1,6 @@
- action: CHALLENGE
expression: path.startsWith("/admin/")
name: robots-txt-policy-disallow-1
- action: CHALLENGE
expression: path.startsWith("/private")
name: robots-txt-policy-disallow-2

View File

@@ -0,0 +1,6 @@
# Test wildcard patterns
User-agent: *
Disallow: /search*
Disallow: /*/private
Disallow: /file?.txt
Disallow: /admin/*?action=delete

View File

@@ -0,0 +1,12 @@
- action: CHALLENGE
expression: path.matches("^/search.*")
name: robots-txt-policy-disallow-1
- action: CHALLENGE
expression: path.matches("^/.*/private")
name: robots-txt-policy-disallow-2
- action: CHALLENGE
expression: path.matches("^/file.\\.txt")
name: robots-txt-policy-disallow-3
- action: CHALLENGE
expression: path.matches("^/admin/.*.action=delete")
name: robots-txt-policy-disallow-4

View File

@@ -51,14 +51,63 @@ bots:
# report_as: 4 # lie to the operator
# algorithm: slow # intentionally waste CPU cycles and time
# Requires a subscription to Thoth to use, see
# https://anubis.techaro.lol/docs/admin/thoth#geoip-based-filtering
- name: countries-with-aggressive-scrapers
action: WEIGH
geoip:
countries:
- BR
- CN
weight:
adjust: 10
# Requires a subscription to Thoth to use, see
# https://anubis.techaro.lol/docs/admin/thoth#asn-based-filtering
- name: aggressive-asns-without-functional-abuse-contact
action: WEIGH
asns:
match:
- 13335 # Cloudflare
- 136907 # Huawei Cloud
- 45102 # Alibaba Cloud
weight:
adjust: 10
# Generic catchall rule
- name: generic-browser
user_agent_regex: >-
Mozilla|Opera
action: CHALLENGE
action: WEIGH
weight:
adjust: 10
dnsbl: false
# Open Graph passthrough configuration, see here for more information:
# https://anubis.techaro.lol/docs/admin/configuration/open-graph/
openGraph:
# Enables Open Graph passthrough
enabled: false
# Enables the use of the HTTP host in the cache key, this enables
# caching metadata for multiple http hosts at once.
considerHost: false
# How long cached OpenGraph metadata should last in memory
ttl: 24h
# # If set, return these opengraph values instead of looking them up with
# # the target service.
# #
# # Correlates to properties in https://ogp.me/
# override:
# # og:title is required, it is the title of the website
# "og:title": "Techaro Anubis"
# "og:description": >-
# Anubis is a Web AI Firewall Utility that helps you fight the bots
# away so that you can maintain uptime at work!
# "description": >-
# Anubis is a Web AI Firewall Utility that helps you fight the bots
# away so that you can maintain uptime at work!
# By default, send HTTP 200 back to clients that either get issued a challenge
# or a denial. This seems weird, but this is load-bearing due to the fact that
# the most aggressive scraper bots seem to really, really, want an HTTP 200 and
@@ -66,3 +115,57 @@ dnsbl: false
status_codes:
CHALLENGE: 200
DENY: 200
# The weight thresholds for when to trigger individual challenges. Any
# CHALLENGE will take precedence over this.
#
# A threshold has four configuration options:
#
# - name: the name that is reported down the stack and used for metrics
# - expression: A CEL expression with the request weight in the variable
# weight
# - action: the Anubis action to apply, similar to in a bot policy
# - challenge: which challenge to send to the user, similar to in a bot policy
#
# See https://anubis.techaro.lol/docs/admin/configuration/thresholds for more
# information.
thresholds:
# By default Anubis ships with the following thresholds:
- name: minimal-suspicion # This client is likely fine, its soul is lighter than a feather
expression: weight < 0 # a feather weighs zero units
action: ALLOW # Allow the traffic through
# For clients that had some weight reduced through custom rules, give them a
# lightweight challenge.
- name: mild-suspicion
expression:
all:
- weight >= 0
- weight < 10
action: CHALLENGE
challenge:
# https://anubis.techaro.lol/docs/admin/configuration/challenges/metarefresh
algorithm: metarefresh
difficulty: 1
report_as: 1
# For clients that are browser-like but have either gained points from custom rules or
# report as a standard browser.
- name: moderate-suspicion
expression:
all:
- weight >= 10
- weight < 20
action: CHALLENGE
challenge:
# https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
algorithm: fast
difficulty: 2 # two leading zeros, very fast for most clients
report_as: 2
# For clients that are browser like and have gained many points from custom rules
- name: extreme-suspicion
expression: weight >= 20
action: CHALLENGE
challenge:
# https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
algorithm: fast
difficulty: 4
report_as: 4

View File

@@ -1,28 +1,26 @@
- name: deny-aggressive-brazilian-scrapers
action: DENY
action: WEIGH
weight:
adjust: 20
expression:
any:
# Internet Explorer should be out of support
- userAgent.contains("MSIE")
# Trident is the Internet Explorer browser engine
- userAgent.contains("Trident")
# Opera is a fork of chrome now
- userAgent.contains("Presto")
# Windows CE is discontinued
- userAgent.contains("Windows CE")
# Windows 95 is discontinued
- userAgent.contains("Windows 95")
# Windows 98 is discontinued
- userAgent.contains("Windows 98")
# Windows 9.x is discontinued
- userAgent.contains("Win 9x")
# Amazon does not have an Alexa Toolbar.
- userAgent.contains("Alexa Toolbar")
- name: challenge-aggressive-brazilian-scrapers
action: CHALLENGE
expression:
any:
# This is not released, even Windows 11 calls itself Windows 10
- userAgent.contains("Windows NT 11.0")
# iPods are not in common use
- userAgent.contains("iPod")
# Internet Explorer should be out of support
- userAgent.contains("MSIE")
# Trident is the Internet Explorer browser engine
- userAgent.contains("Trident")
# Opera is a fork of chrome now
- userAgent.contains("Presto")
# Windows CE is discontinued
- userAgent.contains("Windows CE")
# Windows 95 is discontinued
- userAgent.contains("Windows 95")
# Windows 98 is discontinued
- userAgent.contains("Windows 98")
# Windows 9.x is discontinued
- userAgent.contains("Win 9x")
# Amazon does not have an Alexa Toolbar.
- userAgent.contains("Alexa Toolbar")
# This is not released, even Windows 11 calls itself Windows 10
- userAgent.contains("Windows NT 11.0")
# iPods are not in common use
- userAgent.contains("iPod")

View File

@@ -0,0 +1,6 @@
# Warning: Contains user agents that _must_ be blocked in robots.txt, or the opt-out will have no effect.
# Note: Blocks human-directed/non-training user agents
- name: "ai-robots-txt"
user_agent_regex: >-
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|Andibot|anthropic-ai|Applebot|Applebot-Extended|bedrockbot|Brightbot 1.0|Bytespider|CCBot|ChatGPT-User|Claude-SearchBot|Claude-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|EchoboxBot|FacebookBot|facebookexternalhit|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-CloudVertexBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|MistralAI-User/1.0|MyCentralAIScraperBot|NovaAct|OAI-SearchBot|omgili|omgilibot|Operator|PanguBot|Panscient|panscient.com|Perplexity-User|PerplexityBot|PetalBot|PhindBot|Poseidon Research Crawler|QualifiedBot|QuillBot|quillbot.com|SBIntuitionsBot|Scrapy|SemrushBot|SemrushBot-BA|SemrushBot-CT|SemrushBot-OCOB|SemrushBot-SI|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|wpbot|YandexAdditional|YandexAdditionalBot|YouBot
action: DENY

View File

@@ -1,4 +1,6 @@
- name: cloudflare-workers
headers_regex:
CF-Worker: .*
action: DENY
action: WEIGH
weight:
adjust: 15

View File

@@ -0,0 +1,2 @@
- import: (data)/clients/small-internet-browsers/netsurf.yaml
- import: (data)/clients/small-internet-browsers/palemoon.yaml

View File

@@ -0,0 +1,5 @@
- name: "reduce-weight-netsurf"
user_agent_regex: "NetSurf"
action: WEIGH
weight:
adjust: -5

View File

@@ -0,0 +1,5 @@
- name: "reduce-weight-palemoon"
user_agent_regex: "PaleMoon"
action: WEIGH
weight:
adjust: -5

View File

@@ -1,4 +1,6 @@
# https://connect.mozilla.org/t5/firefox-labs/try-out-link-previews-in-firefox-labs-138-and-share-your/td-p/92012
- name: x-firefox-ai
action: CHALLENGE
action: WEIGH
expression: '"X-Firefox-Ai" in headers'
weight:
adjust: 5

View File

@@ -1,15 +1,15 @@
- name: ipv4-rfc-1918
action: ALLOW
remote_addresses:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
- 100.64.0.0/10
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
- 100.64.0.0/10
- name: ipv6-ula
action: ALLOW
remote_addresses:
- fc00::/7
- fc00::/7
- name: ipv6-link-local
action: ALLOW
remote_addresses:
- fe80::/10
- fe80::/10

View File

@@ -0,0 +1,14 @@
---
slug: welcome
title: Welcome to the Anubis blog!
authors: [xe]
tags: [intro]
---
Hello, world!
At Techaro, we've been working on making Anubis even better, and in the process we want to share what we've done, how it works, and signal boost cool things the community has done. As things happen, we'll blog about them so that you can learn from our struggles.
More details to come soon!
{/* truncate */}

9
docs/blog/authors.yml Normal file
View File

@@ -0,0 +1,9 @@
xe:
name: Xe Iaso
title: CEO @ Techaro
url: https://github.com/Xe
image_url: https://github.com/Xe.png
email: xe@techaro.lol
page: true
socials:
github: Xe

1
docs/blog/tags.yml Normal file
View File

@@ -0,0 +1 @@

View File

@@ -11,6 +11,141 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
## v1.20.0: Thancred Waters
The big ticket items are as follows:
- Implement a no-JS challenge method: [`metarefresh`](./admin/configuration/challenges/metarefresh.mdx) ([#95](https://github.com/TecharoHQ/anubis/issues/95))
- Implement request "weight", allowing administrators to customize the behaviour of Anubis based on specific criteria
- Implement GeoIP and ASN based checks via [Thoth](https://anubis.techaro.lol/docs/admin/thoth) ([#206](https://github.com/TecharoHQ/anubis/issues/206))
- Add [custom weight thresholds](./admin/configuration/thresholds.mdx) via CEL ([#688](https://github.com/TecharoHQ/anubis/pull/688))
- Move Open Graph configuration [to the policy file](./admin/configuration/open-graph.mdx)
- Enable support for Open Graph metadata to be returned by default instead of doing lookups against the target
- Add `robots2policy` CLI utility to convert robots.txt files to Anubis challenge policies using CEL expressions ([#409](https://github.com/TecharoHQ/anubis/issues/409))
- Refactor challenge presentation logic to use a challenge registry
- Allow challenge implementations to register HTTP routes
A lot of performance improvements have been made:
- Replace internal SHA256 hashing with xxhash for 4-6x performance improvement in policy evaluation and cache operations
- Optimized the OGTags subsystem with reduced allocations and runtime per request by up to 66%
- Replace cidranger with bart for IP range checking, improving IP matching performance by 3-20x with zero heap
allocations
And some cleanups/refactors were added:
- Remove the unused `/test-error` endpoint and update the testing endpoint `/make-challenge` to only be enabled in
development
- Add `--xff-strip-private` flag/envvar to toggle skipping X-Forwarded-For private addresses or not
- Bump AI-robots.txt to version 1.37
- Make progress bar styling more compatible (UXP, etc)
- Add `--strip-base-prefix` flag/envvar to strip the base prefix from request paths when forwarding to target servers
Request weight is one of the biggest ticket features in Anubis. This enables Anubis to be much closer to a Web Application Firewall and when combined with custom thresholds allows administrators to have Anubis take advanced reactions. For more information about request weight, see [the request weight section](./admin/policies.mdx#request-weight) of the policy file documentation.
TL;DR when you have one or more WEIGHT rules like this:
```yaml
bots:
- name: gitea-session-token
action: WEIGH
expression:
all:
- '"Cookie" in headers'
- headers["Cookie"].contains("i_love_gitea=")
# Remove 5 weight points
weight:
adjust: -5
```
You can configure custom thresholds like this:
```yaml
thresholds:
- name: minimal-suspicion # This client is likely fine, its soul is lighter than a feather
expression: weight < 0 # a feather weighs zero units
action: ALLOW # Allow the traffic through
# For clients that had some weight reduced through custom rules, give them a
# lightweight challenge.
- name: mild-suspicion
expression:
all:
- weight >= 0
- weight < 10
action: CHALLENGE
challenge:
# https://anubis.techaro.lol/docs/admin/configuration/challenges/metarefresh
algorithm: metarefresh
difficulty: 1
report_as: 1
# For clients that are browser-like but have either gained points from custom
# rules or report as a standard browser.
- name: moderate-suspicion
expression:
all:
- weight >= 10
- weight < 20
action: CHALLENGE
challenge:
# https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
algorithm: fast
difficulty: 2 # two leading zeros, very fast for most clients
report_as: 2
# For clients that are browser like and have gained many points from custom
# rules
- name: extreme-suspicion
expression: weight >= 20
action: CHALLENGE
challenge:
# https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
algorithm: fast
difficulty: 4
report_as: 4
```
These thresholds apply when no other `ALLOW`, `DENY`, or `CHALLENGE` rule matches the request. `WEIGHT` rules add and remove request weight as needed:
```yaml
bots:
- name: gitea-session-token
action: WEIGH
expression:
all:
- '"Cookie" in headers'
- headers["Cookie"].contains("i_love_gitea=")
# Remove 5 weight points
weight:
adjust: -5
- name: bot-like-user-agent
action: WEIGH
expression: '"Bot" in userAgent'
# Add 5 weight points
weight:
adjust: 5
```
Of note: the default "generic browser" rule assigns 10 weight points:
```yaml
# Generic catchall rule
- name: generic-browser
user_agent_regex: >-
Mozilla|Opera
action: WEIGH
weight:
adjust: 10
```
Adjust this as you see fit.
## v1.19.1: Jenomis cen Lexentale - Echo 1
- Return `data/bots/ai-robots-txt.yaml` to avoid breaking configs [#599](https://github.com/TecharoHQ/anubis/issues/599)
## v1.19.0: Jenomis cen Lexentale
Mostly a bunch of small features, no big ticket things this time.
@@ -24,27 +159,27 @@ Mostly a bunch of small features, no big ticket things this time.
- Add `--target-insecure-skip-verify` flag/envvar to allow Anubis to hit a self-signed HTTPS backend
- Minor adjustments to FreeBSD rc.d script to allow for more flexible configuration.
- Added Podman and Docker support for running Playwright tests
- Add a default rule to throw challenges when a request with the `X-Firefox-Ai` header is set.
- Add a default rule to throw challenges when a request with the `X-Firefox-Ai` header is set
- Updated the nonce value in the challenge JWT cookie to be a string instead of a number
- Rename cookies in response to user feedback
- Ensure cookie renaming is consistent across configuration options
- Add Bookstack app in data
- Truncate everything but the first five characters of Accept-Language headers when making challenges
- Ensure client JavaScript is served with Content-Type text/javascript.
- Add `--target-host` flag/envvar to allow changing the value of the Host header in requests forwarded to the target service.
- Add `--target-host` flag/envvar to allow changing the value of the Host header in requests forwarded to the target service
- Bump AI-robots.txt to version 1.31
- Add `RuntimeDirectory` to systemd unit settings so native packages can listen over unix sockets
- Added SearXNG instance tracker whitelist policy
- Added Qualys SSL Labs whitelist policy
- Fixed cookie deletion logic ([#520](https://github.com/TecharoHQ/anubis/issues/520), [#522](https://github.com/TecharoHQ/anubis/pull/522))
- Add `--target-sni` flag/envvar to allow changing the value of the TLS handshake hostname in requests forwarded to the target service.
- Add `--target-sni` flag/envvar to allow changing the value of the TLS handshake hostname in requests forwarded to the target service
- Fixed CEL expression matching validator to now properly error out when it receives empty expressions
- Added OpenRC init.d script.
- Added `--version` flag.
- Added `anubis_proxied_requests_total` metric to count proxied requests.
- Added OpenRC init.d script
- Added `--version` flag
- Added `anubis_proxied_requests_total` metric to count proxied requests
- Add `Applebot` as "good" web crawler
- Reorganize AI/LLM crawler blocking into three separate stances, maintaining existing status quo as default.
- Split out AI/LLM user agent blocking policies, adding documentation for each.
- Reorganize AI/LLM crawler blocking into three separate stances, maintaining existing status quo as default
- Split out AI/LLM user agent blocking policies, adding documentation for each
## v1.18.0: Varis zos Galvus
@@ -141,7 +276,6 @@ Other changes:
- Moved all CSS inline to the Xess package, changed colors to be CSS variables
- Set or append to `X-Forwarded-For` header unless the remote connects over a loopback address [#328](https://github.com/TecharoHQ/anubis/issues/328)
- Fixed mojeekbot user agent regex
- Added support for running anubis behind a base path (e.g. `/myapp`)
- Reduce Anubis' paranoia with user cookies ([#365](https://github.com/TecharoHQ/anubis/pull/365))
- Added support for Open Graph passthrough while using unix sockets
- The Open Graph subsystem now passes the HTTP `HOST` header through to the origin

View File

@@ -0,0 +1,8 @@
{
"label": "Challenges",
"position": 10,
"link": {
"type": "generated-index",
"description": "The different challenge methods that Anubis supports."
}
}

View File

@@ -0,0 +1,19 @@
# Meta Refresh (No JavaScript)
The `metarefresh` challenge sends a browser a much simpler challenge that makes it refresh the page after a set period of time. This enables clients to pass challenges without executing JavaScript.
To use it in your Anubis configuration:
```yaml
# Generic catchall rule
- name: generic-browser
user_agent_regex: >-
Mozilla|Opera
action: CHALLENGE
challenge:
difficulty: 1 # Number of seconds to wait before refreshing the page
report_as: 4 # Unused by this challenge method
algorithm: metarefresh # Specify a non-JS challenge method
```
This is not enabled by default while this method is tested and its false positive rate is ascertained. Many modern scrapers use headless Google Chrome, so this will have a much higher false positive rate.

View File

@@ -0,0 +1,5 @@
# Proof of Work (JavaScript)
When Anubis is configured to use the `fast` or `slow` challenge methods, clients will be sent a small [proof of work](https://en.wikipedia.org/wiki/Proof_of_work) challenge. In order to get a token used to access the upstream resource, clients must calculate a complicated math puzzle with JavaScript.
A `fast` challenge uses a heavily optimized multithreaded implementation and a `slow` challenge uses a simplistic single-threaded implementation. The `slow` method is kept around for legacy compatibility.

View File

@@ -9,12 +9,45 @@ This page provides detailed information on how to configure [Open Graph tag](htt
## Configuration Options
Open Graph settings are configured in the `openGraph` section of the [Policy File](../policies.mdx).
```yaml
openGraph:
# Enables Open Graph passthrough
enabled: true
# Enables the use of the HTTP host in the cache key, this enables
# caching metadata for multiple http hosts at once.
considerHost: true
# How long cached OpenGraph metadata should last in memory
ttl: 24h
# If set, return these opengraph values instead of looking them up with
# the target service.
#
# Correlates to properties in https://ogp.me/
override:
# og:title is required, it is the title of the website
"og:title": "Techaro Anubis"
"og:description": >-
Anubis is a Web AI Firewall Utility that helps you fight the bots
away so that you can maintain uptime at work!
"description": >-
Anubis is a Web AI Firewall Utility that helps you fight the bots
away so that you can maintain uptime at work!
```
<details>
<summary>Configuration flags / envvars (old)</summary>
Open Graph passthrough used to be configured with configuration flags / environment variables. Reference to these settings are maintained for backwards compatibility's sake.
| Name | Description | Type | Default | Example |
| ------------------------ | --------------------------------------------------------- | -------- | ------- | ----------------------------- |
| `OG_PASSTHROUGH` | Enables or disables the Open Graph tag passthrough system | Boolean | `true` | `OG_PASSTHROUGH=true` |
| `OG_EXPIRY_TIME` | Configurable cache expiration time for Open Graph tags | Duration | `24h` | `OG_EXPIRY_TIME=1h` |
| `OG_CACHE_CONSIDER_HOST` | Enables or disables the use of the host in the cache key | Boolean | `false` | `OG_CACHE_CONSIDER_HOST=true` |
</details>
## Usage
To configure Open Graph tags, you can set the following environment variables, environment file or as flags in your Anubis configuration:

View File

@@ -10,6 +10,20 @@ Anubis can act in one of two modes:
1. Reverse proxy (the default): Anubis sits in the middle of all traffic and then will reverse proxy it to its destination. This is the moral equivalent of a middleware in your favorite web framework.
2. Subrequest authentication mode: Anubis listens for requests and if they don't pass muster then they are forwarded to Anubis for challenge processing. This is the equivalent of Anubis being a sidecar service.
:::note
Subrequest authentication requires changing the default policy because nginx interprets the default `DENY` status code `200` as successful authentication and allows the request.
```yaml
status_codes:
CHALLENGE: 200
DENY: 403
```
[See policy definitions](../policies.mdx).
:::
## Nginx
Anubis can perform [subrequest authentication](https://docs.nginx.com/nginx/admin-guide/security-controls/configuring-subrequest-authentication/) with the `auth_request` module in Nginx. In order to set this up, keep the following things in mind:

View File

@@ -0,0 +1,140 @@
# Weight Threshold Configuration
Anubis offers the ability to assign "weight" to requests. This is a custom level of suspicion that rules can add to or remove from. For example, here's how you assign 10 weight points to anything that might be a browser:
```yaml
# botPolicies.yaml
bots:
- name: generic-browser
user_agent_regex: >-
Mozilla|Opera
action: WEIGH
weight:
adjust: 10
```
Thresholds let you take this per-request weight value and take actions in response to it. Thresholds are defined alongside your bot configuration in `botPolicies.yaml`.
:::note
Thresholds DO NOT apply when a request matches a bot rule with the CHALLENGE action. Thresholds only apply when requests don't match any terminal bot rules.
:::
```yaml
# botPolicies.yaml
bots: ...
thresholds:
- name: minimal-suspicion
expression: weight < 0
action: ALLOW
- name: mild-suspicion
expression:
all:
- weight >= 0
- weight < 10
action: CHALLENGE
challenge:
algorithm: metarefresh
difficulty: 1
report_as: 1
- name: moderate-suspicion
expression:
all:
- weight >= 10
- weight < 20
action: CHALLENGE
challenge:
algorithm: fast
difficulty: 2
report_as: 2
- name: extreme-suspicion
expression: weight >= 20
action: CHALLENGE
challenge:
algorithm: fast
difficulty: 4
report_as: 4
```
This defines a suite of 4 thresholds:
1. If the request weight is less than zero, allow it through.
2. If the request weight is greater than or equal to zero, but less than ten: give it [a very lightweight challenge](./challenges/metarefresh.mdx).
3. If the request weight is greater than or equal to ten, but less than twenty: give it [a slightly heavier challenge](./challenges/proof-of-work.mdx).
4. Otherwise, give it [the heaviest challenge](./challenges/proof-of-work.mdx).
Thresholds can be configured with the following options:
<table>
<thead>
<tr>
<th>Name</th>
<th>Description</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td>`name`</td>
<td>The human-readable name for this threshold.</td>
<td>
```yaml
name: extreme-suspicion
```
</td>
</tr>
<tr>
<td>`expression`</td>
<td>A [CEL](https://cel.dev/) expression taking the request weight and returning true or false</td>
<td>
To check if the request weight is less than zero:
```yaml
expression: weight < 0
```
To check if it's between 0 and 10 (inclusive):
```yaml
expression:
all:
- weight >= 0
- weight < 10
```
</td>
</tr>
<tr>
<td>`action`</td>
<td>The Anubis action to apply: `ALLOW`, `CHALLENGE`, or `DENY`</td>
<td>
```yaml
action: ALLOW
```
If you set the CHALLENGE action, you must set challenge details:
```yaml
action: CHALLENGE
challenge:
algorithm: metarefresh
difficulty: 1
report_as: 1
```
</td>
</tr>
</tbody>
</table>

View File

@@ -92,6 +92,7 @@ Assuming you are protecting `anubistest.techaro.lol`, you need the following ser
# throw an "admin misconfiguration" error.
RequestHeader set "X-Real-Ip" expr=%{REMOTE_ADDR}
RequestHeader set X-Forwarded-Proto "https"
RequestHeader set "X-Http-Version" "%{SERVER_PROTOCOL}s"
ProxyPreserveHost On
@@ -119,6 +120,14 @@ Make sure to add a separate configuration file for the listener on port 3001:
```text
# /etc/httpd/conf.d/listener-3001.conf
Listen [::1]:3001
```
In case you are running an IPv4-only system, use the following configuration instead:
```text
# /etc/httpd/conf.d/listener-3001.conf
Listen 127.0.0.1:3001
```

View File

@@ -61,6 +61,7 @@ server {
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Http-Version $server_protocol;
proxy_pass http://anubis;
}

View File

@@ -3,11 +3,10 @@ id: traefik
title: Traefik
---
:::note
This only talks about integration through Compose,
but it also applies to docker cli options.
This only talks about integration through Compose,
but it also applies to docker cli options.
:::

View File

@@ -4,9 +4,6 @@ title: Setting up Anubis
import RandomKey from "@site/src/components/RandomKey";
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
Anubis is meant to sit between your reverse proxy (such as Nginx or Caddy) and your target service. One instance of Anubis must be used per service you are protecting.
<center>
@@ -45,33 +42,47 @@ Anubis has very minimal system requirements. I suspect that 128Mi of ram may be
For more detailed information on installing Anubis with native packages, please read [the native install directions](./native-install.mdx).
## Environment variables
## Configuration
Anubis is configurable via environment variables and [the policy file](./policies.mdx). Most settings are currently exposed with environment variables but they are being slowly moved over to the policy file.
### Configuration via the policy file
Currently the following settings are configurable via the policy file:
- [Bot policies](./policies.mdx)
- [Open Graph passthrough](./configuration/open-graph.mdx)
- [Weight thresholds](./configuration/thresholds.mdx)
### Environment variables
Anubis uses these environment variables for configuration:
| Environment Variable | Default value | Explanation |
| :----------------------------- | :---------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `BASE_PREFIX` | unset | If set, adds a global prefix to all Anubis endpoints. For example, setting this to `/myapp` would make Anubis accessible at `/myapp/` instead of `/`. This is useful when running Anubis behind a reverse proxy that routes based on path prefixes. |
| `BIND` | `:8923` | The network address that Anubis listens on. For `unix`, set this to a path: `/run/anubis/instance.sock` |
| `BIND_NETWORK` | `tcp` | The address family that Anubis listens on. Accepts `tcp`, `unix` and anything Go's [`net.Listen`](https://pkg.go.dev/net#Listen) supports. |
| `COOKIE_DOMAIN` | unset | The domain the Anubis challenge pass cookie should be set to. This should be set to the domain you bought from your registrar (EG: `techaro.lol` if your webapp is running on `anubis.techaro.lol`). See this [stackoverflow explanation of cookies](https://stackoverflow.com/a/1063760) for more information.<br/><br/>Note that unlike `REDIRECT_DOMAINS`, you should never include a port number in this variable. |
| `COOKIE_EXPIRATION_TIME` | `168h` | The amount of time the authorization cookie is valid for. |
| `COOKIE_PARTITIONED` | `false` | If set to `true`, enables the [partitioned (CHIPS) flag](https://developers.google.com/privacy-sandbox/cookies/chips), meaning that Anubis inside an iframe has a different set of cookies than the domain hosting the iframe. |
| `DIFFICULTY` | `4` | The difficulty of the challenge, or the number of leading zeroes that must be in successful responses. |
| `ED25519_PRIVATE_KEY_HEX` | unset | The hex-encoded ed25519 private key used to sign Anubis responses. If this is not set, Anubis will generate one for you. This should be exactly 64 characters long. See below for details. |
| `ED25519_PRIVATE_KEY_HEX_FILE` | unset | Path to a file containing the hex-encoded ed25519 private key. Only one of this or its sister option may be set. |
| `METRICS_BIND` | `:9090` | The network address that Anubis serves Prometheus metrics on. See `BIND` for more information. |
| `METRICS_BIND_NETWORK` | `tcp` | The address family that the Anubis metrics server listens on. See `BIND_NETWORK` for more information. |
| `OG_EXPIRY_TIME` | `24h` | The expiration time for the Open Graph tag cache. |
| `OG_PASSTHROUGH` | `false` | If set to `true`, Anubis will enable Open Graph tag passthrough. |
| `OG_CACHE_CONSIDER_HOST` | `false` | If set to `true`, Anubis will consider the host in the Open Graph tag cache key. |
| `POLICY_FNAME` | unset | The file containing [bot policy configuration](./policies.mdx). See the bot policy documentation for more details. If unset, the default bot policy configuration is used. |
| Environment Variable | Default value | Explanation |
| :----------------------------- | :---------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `BASE_PREFIX` | unset | If set, adds a global prefix to all Anubis endpoints. For example, setting this to `/myapp` would make Anubis accessible at `/myapp/` instead of `/`. This is useful when running Anubis behind a reverse proxy that routes based on path prefixes. |
| `BIND` | `:8923` | The network address that Anubis listens on. For `unix`, set this to a path: `/run/anubis/instance.sock` |
| `BIND_NETWORK` | `tcp` | The address family that Anubis listens on. Accepts `tcp`, `unix` and anything Go's [`net.Listen`](https://pkg.go.dev/net#Listen) supports. |
| `COOKIE_DOMAIN` | unset | The domain the Anubis challenge pass cookie should be set to. This should be set to the domain you bought from your registrar (EG: `techaro.lol` if your webapp is running on `anubis.techaro.lol`). See this [stackoverflow explanation of cookies](https://stackoverflow.com/a/1063760) for more information.<br/><br/>Note that unlike `REDIRECT_DOMAINS`, you should never include a port number in this variable. |
| `COOKIE_EXPIRATION_TIME` | `168h` | The amount of time the authorization cookie is valid for. |
| `COOKIE_PARTITIONED` | `false` | If set to `true`, enables the [partitioned (CHIPS) flag](https://developers.google.com/privacy-sandbox/cookies/chips), meaning that Anubis inside an iframe has a different set of cookies than the domain hosting the iframe. |
| `DIFFICULTY` | `4` | The difficulty of the challenge, or the number of leading zeroes that must be in successful responses. |
| `ED25519_PRIVATE_KEY_HEX` | unset | The hex-encoded ed25519 private key used to sign Anubis responses. If this is not set, Anubis will generate one for you. This should be exactly 64 characters long. See below for details. |
| `ED25519_PRIVATE_KEY_HEX_FILE` | unset | Path to a file containing the hex-encoded ed25519 private key. Only one of this or its sister option may be set. |
| `METRICS_BIND` | `:9090` | The network address that Anubis serves Prometheus metrics on. See `BIND` for more information. |
| `METRICS_BIND_NETWORK` | `tcp` | The address family that the Anubis metrics server listens on. See `BIND_NETWORK` for more information. |
| `OG_EXPIRY_TIME` | `24h` | The expiration time for the Open Graph tag cache. Prefer using [the policy file](./configuration/open-graph.mdx) to configure the Open Graph subsystem. |
| `OG_PASSTHROUGH` | `false` | If set to `true`, Anubis will enable Open Graph tag passthrough. Prefer using [the policy file](./configuration/open-graph.mdx) to configure the Open Graph subsystem. |
| `OG_CACHE_CONSIDER_HOST` | `false` | If set to `true`, Anubis will consider the host in the Open Graph tag cache key. Prefer using [the policy file](./configuration/open-graph.mdx) to configure the Open Graph subsystem. |
| `POLICY_FNAME` | unset | The file containing [bot policy configuration](./policies.mdx). See the bot policy documentation for more details. If unset, the default bot policy configuration is used. |
| `REDIRECT_DOMAINS` | unset | If set, restrict the domains that Anubis can redirect to when passing a challenge.<br/><br/>If this is unset, Anubis may redirect to any domain which could cause security issues in the unlikely case that an attacker passes a challenge for your browser and then tricks you into clicking a link to your domain.<br/><br/>Note that if you are hosting Anubis on a non-standard port (`https://example:com:8443`, `http://www.example.net:8080`, etc.), you must also include the port number here. |
| `SERVE_ROBOTS_TXT` | `false` | If set `true`, Anubis will serve a default `robots.txt` file that disallows all known AI scrapers by name and then additionally disallows every scraper. This is useful if facts and circumstances make it difficult to change the underlying service to serve such a `robots.txt` file. |
| `SOCKET_MODE` | `0770` | _Only used when at least one of the `*_BIND_NETWORK` variables are set to `unix`._ The socket mode (permissions) for Unix domain sockets. |
| `TARGET` | `http://localhost:3923` | The URL of the service that Anubis should forward valid requests to. Supports Unix domain sockets, set this to a URI like so: `unix:///path/to/socket.sock`. |
| `USE_REMOTE_ADDRESS` | unset | If set to `true`, Anubis will take the client's IP from the network socket. For production deployments, it is expected that a reverse proxy is used in front of Anubis, which pass the IP using headers, instead. |
| `WEBMASTER_EMAIL` | unset | If set, shows a contact email address when rendering error pages. This email address will be how users can get in contact with administrators. |
| `SERVE_ROBOTS_TXT` | `false` | If set `true`, Anubis will serve a default `robots.txt` file that disallows all known AI scrapers by name and then additionally disallows every scraper. This is useful if facts and circumstances make it difficult to change the underlying service to serve such a `robots.txt` file. |
| `SOCKET_MODE` | `0770` | _Only used when at least one of the `*_BIND_NETWORK` variables are set to `unix`._ The socket mode (permissions) for Unix domain sockets. |
| `STRIP_BASE_PREFIX` | `false` | If set to `true`, strips the base prefix from request paths when forwarding to the target server. This is useful when your target service expects to receive requests without the base prefix. For example, with `BASE_PREFIX=/foo` and `STRIP_BASE_PREFIX=true`, a request to `/foo/bar` would be forwarded to the target as `/bar`. |
| `TARGET` | `http://localhost:3923` | The URL of the service that Anubis should forward valid requests to. Supports Unix domain sockets, set this to a URI like so: `unix:///path/to/socket.sock`. |
| `USE_REMOTE_ADDRESS` | unset | If set to `true`, Anubis will take the client's IP from the network socket. For production deployments, it is expected that a reverse proxy is used in front of Anubis, which pass the IP using headers, instead. |
| `WEBMASTER_EMAIL` | unset | If set, shows a contact email address when rendering error pages. This email address will be how users can get in contact with administrators. |
| `XFF_STRIP_PRIVATE` | `true` | If set, strip private addresses from `X-Forwarded-For` headers. To unset this, you must set `XFF_STRIP_PRIVATE=false` or `--xff-strip-private=false`. |
<details>
<summary>Advanced configuration settings</summary>
@@ -128,6 +139,22 @@ With corresponding Anubis configuration:
BASE_PREFIX=/myapp
```
#### Stripping Base Prefix
If your target service doesn't expect to receive the base prefix in request paths, you can use the `STRIP_BASE_PREFIX` option:
```
BASE_PREFIX=/myapp
STRIP_BASE_PREFIX=true
```
With this configuration:
- A request to `/myapp/api/users` would be forwarded to your target service as `/api/users`
- A request to `/myapp/` would be forwarded as `/`
This is particularly useful when working with applications that weren't designed to handle path prefixes. However, note that if your target application generates absolute redirects or links (like `/login` instead of `./login`), these may break the subpath routing since they won't include the base prefix.
### Key generation
To generate an ed25519 private key, you can use this command:

View File

@@ -244,3 +244,33 @@ In case your service needs it for risk calculation reasons, Anubis exposes infor
| `X-Anubis-Status` | The status and how strict Anubis was in its checks | `PASS` |
Policy rules are matched using [Go's standard library regular expressions package](https://pkg.go.dev/regexp). You can mess around with the syntax at [regex101.com](https://regex101.com), make sure to select the Golang option.
## Request Weight
Anubis rules can also add or remove "weight" from requests, allowing administrators to configure custom levels of suspicion. For example, if your application uses session tokens named `i_love_gitea`:
```yaml
- name: gitea-session-token
action: WEIGH
expression:
all:
- '"Cookie" in headers'
- headers["Cookie"].contains("i_love_gitea=")
# Remove 5 weight points
weight:
adjust: -5
```
This would remove five weight points from the request, which would make Anubis present the [Meta Refresh challenge](./configuration/challenges/metarefresh.mdx) in the default configuration.
### Weight Thresholds
For more information on configuring weight thresholds, see [Weight Threshold Configuration](./configuration/thresholds.mdx)
### Advice
Weight is still very new and needs work. This is an experimental feature and should be treated as such. Here's some advice to help you better tune requests:
- The default weight for browser-like clients is 10. This triggers an aggressive challenge.
- Remove and add weight in multiples of five.
- Be careful with how you configure weight.

View File

@@ -0,0 +1,84 @@
---
title: robots2policy CLI Tool
sidebar_position: 50
---
The `robots2policy` tool converts robots.txt files into Anubis challenge policies. It reads robots.txt rules and generates equivalent CEL expressions for path matching and user-agent filtering.
## Installation
Install directly with Go:
```bash
go install github.com/TecharoHQ/anubis/cmd/robots2policy@latest
```
## Usage
Basic conversion from URL:
```bash
robots2policy -input https://www.example.com/robots.txt
```
Convert local file to YAML:
```bash
robots2policy -input robots.txt -output policy.yaml
```
Convert with custom settings:
```bash
robots2policy -input robots.txt -action DENY -format json
```
## Options
| Flag | Description | Default |
|-----------------------|--------------------------------------------------------------------|---------------------|
| `-input` | robots.txt file path or URL (use `-` for stdin) | *required* |
| `-output` | Output file (use `-` for stdout) | stdout |
| `-format` | Output format: `yaml` or `json` | `yaml` |
| `-action` | Action for disallowed paths: `ALLOW`, `DENY`, `CHALLENGE`, `WEIGH` | `CHALLENGE` |
| `-name` | Policy name prefix | `robots-txt-policy` |
| `-crawl-delay-weight` | Weight adjustment for crawl-delay rules | `3` |
| `-deny-user-agents` | Action for blacklisted user agents | `DENY` |
## Example
Input robots.txt:
```txt
User-agent: *
Disallow: /admin/
Disallow: /private
User-agent: BadBot
Disallow: /
```
Generated policy:
```yaml
- name: robots-txt-policy-disallow-1
action: CHALLENGE
expression:
single: path.startsWith("/admin/")
- name: robots-txt-policy-disallow-2
action: CHALLENGE
expression:
single: path.startsWith("/private")
- name: robots-txt-policy-blacklist-3
action: DENY
expression:
single: userAgent.contains("BadBot")
```
## Using the Generated Policy
Save the output and import it in your main policy file:
```yaml
import:
- path: "./robots-policy.yaml"
```
The tool handles wildcard patterns, user-agent specific rules, and blacklisted bots automatically.

81
docs/docs/admin/thoth.mdx Normal file
View File

@@ -0,0 +1,81 @@
# Thoth-based advanced checks
Status: Beta
Anubis instances are normally isolated. Each Anubis instance has its own configuration and exists in roughly its own world without any long term memory between requests. As threats, workarounds, and AI scraper toolchains evolve, administrators will need a way to get more up to date information faster than Anubis' release cycle.
Thus, Thoth is being created. Thoth is the reputation database for Anubis. Thoth feeds information to Anubis so that it can make better decisions about which traffic is innocuous and which traffic is suspicious.
:::note
Thoth is hosted by [Techaro](https://techaro.lol). Thoth is a paid service. Thoth is opt-in and requires manual intervention (including payment) to use. The code that powers Thoth is currently closed source.
To get access to Thoth, please subscribe [on GitHub Sponsors](https://github.com/sponsors/Xe) and [email Xe](mailto:xe@techaro.lol). This will be self-service soon.
:::
## Implementation
Thoth is a web service that listens over [gRPC](https://grpc.io/). Thoth's API is documented in protocol buffer definitions in the GitHub repo [TecharoHQ/thoth-proto](https://github.com/TecharoHQ/thoth-proto).
Thoth is designed to be _informative_, not _authoritative_. Thoth cannot and will not arbitrarily block requests, origins, or other traffic. Thoth is there to inform Anubis and influence the weight of requests so that upstream resources can be protected. Additionally, Anubis aggressively caches data from Thoth such that over time Anubis will not need to request data very often. This makes the fast path for repeat visitors even faster and reduces the amount of data that Thoth is exposed to.
## Thoth features
Thoth is currently in active development. Currently, Thoth provides the following features to Anubis:
- BGP Autonomous System (ASN) based filtering
- GeoIP location based filtering
### ASN-based filtering
When companies link their backbone infrastructure to the Internet, they do so via a [BGP Autonomous System](<https://en.wikipedia.org/wiki/Autonomous_system_(Internet)>), denoted by a number (the Autonomous System Number or ASN). Every IP address on the Internet is owned by an ASN with a 1:1 lookup that does not change very frequently.
Anubis uses Thoth to match IP addresses to BGP Autonomous Systems so that you can either issue arbitrary challenges to individual internet service providers (such as Cloudflare or Huawei Cloud) or, at the administrator's explicit instruction, block them altogether. For example, here's how you add 10 weight points to requests from Cloudflare, Huawei Cloud, and Alibaba Cloud:
```yaml
- name: aggressive-asns-without-functional-abuse-contact
action: WEIGH
asns:
match:
- 13335 # Cloudflare
- 136907 # Huawei Cloud
- 45102 # Alibaba Cloud
weight:
adjust: 10
```
You can look up details for [AS13335](https://bgp.tools/as/13335) or any of these other top offenders on [bgp.tools](https://bgp.tools).
### GeoIP-based filtering
In extreme cases, an administrator may have to take action against an entire country. This is not an ideal circumstance, but sometimes reality forces their hands and the administrators just want to sleep at night.
Anubis uses Thoth to look up the geographic location registered to an IP address. This lookup is not the best and will get better with time, but you ship what you can so you can make it better for next time.
For example, to add 10 weight points to requests from Brazil and China:
```yaml
- name: countries-with-aggressive-scrapers
action: WEIGH
geoip:
countries:
- BR
- CN
weight:
adjust: 10
```
Use this with care.
## Work-in-progress features
This section is a bit aspirational and is where Thoth will end up rather than things you can use today.
In general, a lot of Thoth features are focused on taking the same Anubis you know and love and making it better, smarter, and less paranoid. These include:
- Private rulesets for advanced patterns, current known exploits, and other recognition tactics that need to be kept cloak and dagger for operational security reasons
- Private challenge implementations via WebAssembly, including advanced browser detection logic
- Reputation querying so that Thoth can arbitrarily influence the weight of requests based on the net aggregate pass rate so that the most common browsers can get through with no challenge issued at all
- APIs for trusted administrators to report abusive request fingerprints so that Anubis can react to threats as they evolve
- A way for Anubis to periodically report the pass rate per ASN and other fingerprints so that methodology can be improved

View File

@@ -91,7 +91,7 @@ work valid?"}
## Proof of passing challenges
When a client passes a challenge, Anubis sets an HTTP cookie named `"within.website-x-cmd-anubis-auth"` containing a signed [JWT](https://jwt.io/) (JSON Web Token). This JWT contains the following claims:
When a client passes a challenge, Anubis sets an HTTP cookie named `"techaro.lol-anubis-auth"` containing a signed [JWT](https://jwt.io/) (JSON Web Token). This JWT contains the following claims:
- `challenge`: The challenge string derived from user request metadata
- `nonce`: The nonce / iteration number used to generate the passing response

View File

@@ -19,14 +19,48 @@ title: Anubis
Anubis is brought to you by sponsors and donors like:
[![Distrust](/img/sponsors/distrust-logo.webp)](https://distrust.co?utm_campaign=github&utm_medium=referral&utm_content=anubis)
[![Terminal Trove](/img/sponsors/terminal-trove.webp)](https://terminaltrove.com/?utm_campaign=github&utm_medium=referral&utm_content=anubis&utm_source=abgh)
[![canine.tools](/img/sponsors/caninetools-logo.webp)](https://canine.tools?utm_campaign=github&utm_medium=referral&utm_content=anubis)
[![Weblate](/img/sponsors/weblate-logo.webp)](https://weblate.org/?utm_campaign=github&utm_medium=referral&utm_content=anubis)
### Diamond Tier
<a href="https://www.raptorcs.com/content/base/products.html">
<img
src="/img/sponsors/raptor-computing-logo.webp"
alt="Raptor Computing Systems"
height="64"
/>
</a>
### Gold Tier
<a href="https://distrust.co?utm_campaign=github&utm_medium=referral&utm_content=anubis">
<img src="/img/sponsors/distrust-logo.webp" alt="Distrust" height="64" />
</a>
<a href="https://terminaltrove.com/?utm_campaign=github&utm_medium=referral&utm_content=anubis&utm_source=abgh">
<img
src="/img/sponsors/terminal-trove.webp"
alt="Terminal Trove"
height="64"
/>
</a>
<a href="https://canine.tools?utm_campaign=github&utm_medium=referral&utm_content=anubis">
<img
src="/img/sponsors/caninetools-logo.webp"
alt="canine.tools"
height="64"
/>
</a>
<a href="https://weblate.org/">
<img src="/img/sponsors/weblate-logo.webp" alt="Weblate" height="64" />
</a>
<a href="https://uberspace.de/">
<img src="/img/sponsors/uberspace-logo.webp" alt="Uberspace" height="64" />
</a>
<a href="https://wildbase.xyz/">
<img src="/img/sponsors/wildbase-logo.webp" alt="Wildbase" height="64" />
</a>
## Overview
Anubis [weighs the soul of your connection](https://en.wikipedia.org/wiki/Weighing_of_souls) using a proof-of-work challenge in order to protect upstream resources from scraper bots.
Anubis is a Web AI Firewall Utility that [weighs the soul of your connection](https://en.wikipedia.org/wiki/Weighing_of_souls) using one or more challenges in order to protect upstream resources from scraper bots.
This program is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them.

View File

@@ -18,3 +18,11 @@ Anubis uses [Web Workers](https://developer.mozilla.org/en-US/docs/Web/API/Web_W
2. Web Workers allow you to do multithreaded execution of JavaScript code. This lets Anubis run its checks in parallel across all your system cores so that the challenge can complete as fast as possible. In the last decade, most CPU advancements have come from making cores and code extremely parallel. Using Web Workers lets Anubis take advantage of your hardware as much as possible so that the challenge finishes as fast as possible.
If you use a browser extension such as [JShelter](https://jshelter.org/), you will need to [modify your JShelter configuration](./known-broken-extensions.md#jshelter) to allow Anubis' proof of work computation to complete.
## Does Anubis mine Bitcoin?
No. Anubis does not mine Bitcoin.
In order to mine bitcoin, you need to download a copy of the blockchain (so you have the state required to do mining) and also broadcast your mined blocks to the network should you reach a hash with the right number of leading zeroes. You also need to continuously read for newly broadcasted transactions so you can batch them into a block. This requires gigabytes of data to be transferred from the server to the client.
Anubis transfers two digit numbers of kilobytes from the server to the client (which you can independently verify with your browser's Developer Tools feature). This is orders of magnitude below what is required to mine Bitcoin.

View File

@@ -40,13 +40,25 @@ This page contains a non-exhaustive list with all websites using Anubis.
- https://openwrt.org/
- https://minihoot.site
- https://catgirl.click/
- https://wiki.dolphin-emu.org/
- https://squirreljme.cc/
- https://gitlab.postmarketos.org/
- https://wiki.koha-community.org/
- https://extensions.typo3.org/
- <details>
<summary>FreeCAD</summary>
- https://forum.freecad.org/
- https://wiki.freecad.org/
</details>
- <details>
<summary>ReactOS</summary>
- https://reactos.org/forum
- https://reactos.org/wiki
- https://git.reactos.org
</details>
- <details>
<summary>ScummVM</summary>
- https://bugs.scummvm.org/
- https://forums.scummvm.org/
- https://wiki.scummvm.org/
</details>
@@ -63,3 +75,10 @@ This page contains a non-exhaustive list with all websites using Anubis.
<summary>The United Nations</summary>
- https://policytoolbox.iiep.unesco.org/
</details>
- <details>
<summary>hebis (Alliance of Hessian Libraries)</summary>
- https://ubmr.hds.hebis.de/
- https://tufind.hds.hebis.de/
- https://karla.hds.hebis.de/
- and many more (see https://www.hebis.de/dienste/hebis-discovery-system/)
</details>

View File

@@ -47,21 +47,21 @@ const config: Config = {
editUrl:
'https://github.com/TecharoHQ/anubis/tree/main/docs/',
},
// blog: {
// showReadingTime: true,
// feedOptions: {
// type: ['rss', 'atom', "json"],
// xslt: true,
// },
// // Please change this to your repo.
// // Remove this to remove the "edit this page" links.
// editUrl:
// 'https://github.com/facebook/docusaurus/tree/main/packages/create-docusaurus/templates/shared/',
// // Useful options to enforce blogging best practices
// onInlineTags: 'warn',
// onInlineAuthors: 'warn',
// onUntruncatedBlogPosts: 'warn',
// },
blog: {
showReadingTime: true,
feedOptions: {
type: ['rss', 'atom', "json"],
xslt: true,
},
// Please change this to your repo.
// Remove this to remove the "edit this page" links.
editUrl:
'https://github.com/facebook/docusaurus/tree/main/packages/create-docusaurus/templates/shared/',
// Useful options to enforce blogging best practices
onInlineTags: 'warn',
onInlineAuthors: 'warn',
onUntruncatedBlogPosts: 'warn',
},
theme: {
customCss: './src/css/custom.css',
},
@@ -86,9 +86,14 @@ const config: Config = {
type: 'docSidebar',
sidebarId: 'tutorialSidebar',
position: 'left',
label: 'Tutorial',
label: 'Docs',
},
{ to: '/blog', label: 'Blog', position: 'left' },
{
href: 'https://github.com/sponsors/Xe',
label: "Sponsorship",
position: 'left'
},
// { to: '/blog', label: 'Blog', position: 'left' },
{
href: 'https://github.com/TecharoHQ/anubis',
label: 'GitHub',
@@ -128,6 +133,10 @@ const config: Config = {
{
title: 'More',
items: [
{
label: 'Blog',
to: '/blog',
},
{
label: 'GitHub',
href: 'https://github.com/TecharoHQ/anubis',

View File

@@ -0,0 +1,6 @@
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: anubis-docs-thoth
spec:
itemPath: "vaults/lc5zo4zjz3if3mkeuhufjmgmui/items/pwguumqcmtxvqbeb7y4gj7l36i"

View File

@@ -0,0 +1,79 @@
## Anubis has the ability to let you import snippets of configuration into the main
## configuration file. This allows you to break up your config into smaller parts
## that get logically assembled into one big file.
##
## Of note, a bot rule can either have inline bot configuration or import a
## bot config snippet. You cannot do both in a single bot rule.
##
## Import paths can either be prefixed with (data) to import from the common/shared
## rules in the data folder in the Anubis source tree or will point to absolute/relative
## paths in your filesystem. If you don't have access to the Anubis source tree, check
## /usr/share/docs/anubis/data or in the tarball you extracted Anubis from.
bots:
# Pathological bots to deny
- # This correlates to data/bots/deny-pathological.yaml in the source tree
# https://github.com/TecharoHQ/anubis/blob/main/data/bots/deny-pathological.yaml
import: (data)/bots/_deny-pathological.yaml
- import: (data)/bots/aggressive-brazilian-scrapers.yaml
# Aggressively block AI/LLM related bots/agents by default
- import: (data)/meta/ai-block-aggressive.yaml
# Consider replacing the aggressive AI policy with more selective policies:
# - import: (data)/meta/ai-block-moderate.yaml
# - import: (data)/meta/ai-block-permissive.yaml
# Search engine crawlers to allow, defaults to:
# - Google (so they don't try to bypass Anubis)
# - Apple
# - Bing
# - DuckDuckGo
# - Qwant
# - The Internet Archive
# - Kagi
# - Marginalia
# - Mojeek
- import: (data)/crawlers/_allow-good.yaml
# Challenge Firefox AI previews
- import: (data)/clients/x-firefox-ai.yaml
# Allow common "keeping the internet working" routes (well-known, favicon, robots.txt)
- import: (data)/common/keep-internet-working.yaml
# # Punish any bot with "bot" in the user-agent string
# # This is known to have a high false-positive rate, use at your own risk
# - name: generic-bot-catchall
# user_agent_regex: (?i:bot|crawler)
# action: CHALLENGE
# challenge:
# difficulty: 16 # impossible
# report_as: 4 # lie to the operator
# algorithm: slow # intentionally waste CPU cycles and time
- name: rss-feed-blog
action: ALLOW
expression:
any:
- path.startsWith("/blog/atom.")
- path.startsWith("/blog/rss.")
# Generic catchall rule
- name: generic-browser
user_agent_regex: >-
Mozilla|Opera
action: CHALLENGE
challenge:
difficulty: 1 # Number of seconds to wait before refreshing the page
report_as: 4 # Unused by this challenge method
algorithm: metarefresh # Specify a non-JS challenge method
dnsbl: false
# By default, send HTTP 200 back to clients that either get issued a challenge
# or a denial. This seems weird, but this is load-bearing due to the fact that
# the most aggressive scraper bots seem to really, really, want an HTTP 200 and
# will stop sending requests once they get it.
status_codes:
CHALLENGE: 200
DENY: 200

View File

@@ -11,48 +11,63 @@ spec:
labels:
app: anubis-docs
spec:
volumes:
- name: anubis
configMap:
name: anubis-cfg
containers:
- name: anubis-docs
image: ghcr.io/techarohq/anubis/docs:main
imagePullPolicy: Always
resources:
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 80
- name: anubis
image: ghcr.io/techarohq/anubis:main
imagePullPolicy: Always
env:
- name: "BIND"
value: ":8081"
- name: "DIFFICULTY"
value: "4"
- name: "METRICS_BIND"
value: ":9090"
- name: "POLICY_FNAME"
value: ""
- name: "SERVE_ROBOTS_TXT"
value: "false"
- name: "TARGET"
value: "http://localhost:80"
# - name: "SLOG_LEVEL"
# value: "debug"
resources:
limits:
cpu: 500m
memory: 128Mi
requests:
cpu: 250m
memory: 128Mi
securityContext:
runAsUser: 1000
runAsGroup: 1000
runAsNonRoot: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
- name: anubis-docs
image: ghcr.io/techarohq/anubis/docs:main
imagePullPolicy: Always
resources:
limits:
memory: "128Mi"
cpu: "500m"
requests:
cpu: 250m
memory: 128Mi
ports:
- containerPort: 80
- name: anubis
image: ghcr.io/techarohq/anubis:main
imagePullPolicy: Always
env:
- name: "BIND"
value: ":8081"
- name: "DIFFICULTY"
value: "4"
- name: "METRICS_BIND"
value: ":9090"
- name: "OG_PASSTHROUGH"
value: "true"
- name: "POLICY_FNAME"
value: "/xe/cfg/anubis/botPolicies.yaml"
- name: "SERVE_ROBOTS_TXT"
value: "false"
- name: "TARGET"
value: "http://localhost:80"
# - name: "SLOG_LEVEL"
# value: "debug"
volumeMounts:
- name: anubis
mountPath: /xe/cfg/anubis
resources:
limits:
cpu: 500m
memory: 128Mi
requests:
cpu: 250m
memory: 128Mi
securityContext:
runAsUser: 1000
runAsGroup: 1000
runAsNonRoot: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
envFrom:
- secretRef:
name: anubis-docs-thoth

View File

@@ -1,5 +1,13 @@
resources:
- 1password.yaml
- deployment.yaml
- ingress.yaml
- onionservice.yaml
- service.yaml
- poddisruptionbudget.yaml
- service.yaml
configMapGenerator:
- name: anubis-cfg
behavior: create
files:
- ./cfg/anubis/botPolicies.yaml

View File

@@ -0,0 +1,9 @@
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: anubis-docs
spec:
minAvailable: 1
selector:
matchLabels:
app: anubis-docs

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.1 KiB

After

Width:  |  Height:  |  Size: 476 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.6 KiB

40
go.mod
View File

@@ -3,20 +3,29 @@ module github.com/TecharoHQ/anubis
go 1.24.2
require (
github.com/a-h/templ v0.3.865
github.com/TecharoHQ/thoth-proto v0.4.0
github.com/a-h/templ v0.3.898
github.com/cespare/xxhash/v2 v2.3.0
github.com/facebookgo/flagenv v0.0.0-20160425205200-fcd59fca7456
github.com/gaissmai/bart v0.20.4
github.com/golang-jwt/jwt/v5 v5.2.2
github.com/google/cel-go v0.25.0
github.com/grpc-ecosystem/go-grpc-middleware/providers/prometheus v1.0.1
github.com/grpc-ecosystem/go-grpc-middleware/v2 v2.1.0
github.com/joho/godotenv v1.5.1
github.com/playwright-community/playwright-go v0.5200.0
github.com/prometheus/client_golang v1.22.0
github.com/sebest/xff v0.0.0-20210106013422-671bd2870b3a
github.com/yl2chen/cidranger v1.0.2
golang.org/x/net v0.40.0
golang.org/x/net v0.41.0
google.golang.org/grpc v1.72.2
gopkg.in/yaml.v3 v3.0.1
k8s.io/apimachinery v0.33.1
sigs.k8s.io/yaml v1.4.0
)
require (
al.essio.dev/pkg/shellescape v1.6.0 // indirect
buf.build/gen/go/bufbuild/protovalidate/protocolbuffers/go v1.36.6-20250425153114-8976f5be98c1.1 // indirect
cel.dev/expr v0.23.1 // indirect
dario.cat/mergo v1.0.2 // indirect
github.com/AlekSi/pointer v1.2.0 // indirect
@@ -35,12 +44,11 @@ require (
github.com/blakesmith/ar v0.0.0-20190502131153-809d4375e1fb // indirect
github.com/cavaliergopher/cpio v1.0.1 // indirect
github.com/cenkalti/backoff/v4 v4.3.0 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/cli/browser v1.3.0 // indirect
github.com/cli/go-gh v0.1.0 // indirect
github.com/cloudflare/circl v1.6.0 // indirect
github.com/cloudflare/circl v1.6.1 // indirect
github.com/cyphar/filepath-securejoin v0.4.1 // indirect
github.com/deckarep/golang-set/v2 v2.7.0 // indirect
github.com/deckarep/golang-set/v2 v2.8.0 // indirect
github.com/dlclark/regexp2 v1.11.4 // indirect
github.com/dop251/goja v0.0.0-20250309171923-bcd7cc6bf64c // indirect
github.com/emirpasic/gods v1.18.1 // indirect
@@ -84,31 +92,29 @@ require (
github.com/shopspring/decimal v1.4.0 // indirect
github.com/skeema/knownhosts v1.3.1 // indirect
github.com/spf13/cast v1.7.1 // indirect
github.com/stoewer/go-strcase v1.2.0 // indirect
github.com/stoewer/go-strcase v1.3.0 // indirect
github.com/ulikunitz/xz v0.5.12 // indirect
github.com/xanzy/ssh-agent v0.3.3 // indirect
gitlab.com/digitalxero/go-conventional-commit v1.0.7 // indirect
golang.org/x/crypto v0.38.0 // indirect
golang.org/x/crypto v0.39.0 // indirect
golang.org/x/exp v0.0.0-20240719175910-8a7402abbf56 // indirect
golang.org/x/exp/typeparams v0.0.0-20231108232855-2478ac86f678 // indirect
golang.org/x/mod v0.24.0 // indirect
golang.org/x/sync v0.14.0 // indirect
golang.org/x/mod v0.25.0 // indirect
golang.org/x/sync v0.15.0 // indirect
golang.org/x/sys v0.33.0 // indirect
golang.org/x/telemetry v0.0.0-20240522233618-39ace7a40ae7 // indirect
golang.org/x/term v0.32.0 // indirect
golang.org/x/text v0.25.0 // indirect
golang.org/x/tools v0.32.0 // indirect
golang.org/x/text v0.26.0 // indirect
golang.org/x/tools v0.33.0 // indirect
golang.org/x/vuln v1.1.4 // indirect
golang.org/x/xerrors v0.0.0-20240716161551-93cc26a95ae9 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20240826202546-f6391c0de4c7 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240826202546-f6391c0de4c7 // indirect
google.golang.org/protobuf v1.36.5 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20250218202821-56aae31c358a // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20250218202821-56aae31c358a // indirect
google.golang.org/protobuf v1.36.6 // indirect
gopkg.in/warnings.v0 v0.1.2 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
honnef.co/go/tools v0.6.1 // indirect
mvdan.cc/sh/v3 v3.11.0 // indirect
sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)
tool (

106
go.sum
View File

@@ -1,9 +1,9 @@
al.essio.dev/pkg/shellescape v1.6.0 h1:NxFcEqzFSEVCGN2yq7Huv/9hyCEGVa/TncnOOBBeXHA=
al.essio.dev/pkg/shellescape v1.6.0/go.mod h1:6sIqp7X2P6mThCQ7twERpZTuigpr6KbZWtls1U8I890=
buf.build/gen/go/bufbuild/protovalidate/protocolbuffers/go v1.36.6-20250425153114-8976f5be98c1.1 h1:YhMSc48s25kr7kv31Z8vf7sPUIq5YJva9z1mn/hAt0M=
buf.build/gen/go/bufbuild/protovalidate/protocolbuffers/go v1.36.6-20250425153114-8976f5be98c1.1/go.mod h1:avRlCjnFzl98VPaeCtJ24RrV/wwHFzB8sWXhj26+n/U=
cel.dev/expr v0.23.1 h1:K4KOtPCJQjVggkARsjG9RWXP6O4R73aHeJMa/dmCQQg=
cel.dev/expr v0.23.1/go.mod h1:hLPLo1W4QUmuYdA72RBX06QTs6MXw941piREPl3Yfiw=
dario.cat/mergo v1.0.1 h1:Ra4+bf83h2ztPIQYNP99R6m+Y7KfnARDfID+a+vLl4s=
dario.cat/mergo v1.0.1/go.mod h1:uNxQE+84aUszobStD9th8a29P2fMDhsBdgRYvZOxGmk=
dario.cat/mergo v1.0.2 h1:85+piFYR1tMbRrLcDwR18y4UKJ3aH1Tbzi24VRW1TK8=
dario.cat/mergo v1.0.2/go.mod h1:E/hbnu0NxMFBjpMIE34DRGLWqDy0g5FuKDhCb31ngxA=
github.com/AlekSi/pointer v1.2.0 h1:glcy/gc4h8HnG2Z3ZECSzZ1IX1x2JxRVuDzaJwQE0+w=
@@ -22,8 +22,6 @@ github.com/Masterminds/sprig/v3 v3.3.0/go.mod h1:Zy1iXRYNqNLUolqCpL4uhk6SHUMAOSC
github.com/Microsoft/go-winio v0.5.2/go.mod h1:WpS1mjBmmwHBEWmogvA2mj8546UReBk4v8QkMxJ6pZY=
github.com/Microsoft/go-winio v0.6.2 h1:F2VQgta7ecxGYO8k3ZZz3RS8fVIXVxONVUPlNERoyfY=
github.com/Microsoft/go-winio v0.6.2/go.mod h1:yd8OoFMLzJbo9gZq8j5qaps8bJ9aShtEA8Ipt1oGCvU=
github.com/ProtonMail/go-crypto v1.1.6 h1:ZcV+Ropw6Qn0AX9brlQLAUXfqLBc7Bl+f/DmNxpLfdw=
github.com/ProtonMail/go-crypto v1.1.6/go.mod h1:rA3QumHc/FZ8pAHreoekgiAbzpNsfQAosU5td4SnOrE=
github.com/ProtonMail/go-crypto v1.2.0 h1:+PhXXn4SPGd+qk76TlEePBfOfivE0zkWFenhGhFLzWs=
github.com/ProtonMail/go-crypto v1.2.0/go.mod h1:9whxjD8Rbs29b4XWbB8irEcE8KHMqaR2e7GWU1R+/PE=
github.com/ProtonMail/go-mime v0.0.0-20230322103455-7d82a3887f2f h1:tCbYj7/299ekTTXpdwKYF8eBlsYsDVoggDAuAjoK66k=
@@ -32,14 +30,14 @@ github.com/ProtonMail/gopenpgp/v2 v2.7.1 h1:Awsg7MPc2gD3I7IFac2qE3Gdls0lZW8SzrFZ
github.com/ProtonMail/gopenpgp/v2 v2.7.1/go.mod h1:/BU5gfAVwqyd8EfC3Eu7zmuhwYQpKs+cGD8M//iiaxs=
github.com/Songmu/gitconfig v0.2.0 h1:pX2++u4KUq+K2k/ZCzGXLtkD3ceCqIdi0tDyb+IbSyo=
github.com/Songmu/gitconfig v0.2.0/go.mod h1:cB5bYJer+pl7W8g6RHFwL/0X6aJROVrYuHlvc7PT+hE=
github.com/TecharoHQ/yeet v0.2.3 h1:Pcsnq5HTnk4Xntlu/FNEidH7x55bIx+f5Mk1hpVIngs=
github.com/TecharoHQ/yeet v0.2.3/go.mod h1:avLiwxZpNY37A/o35XledvdmGnTkm3G7+Oskxca6Z7Y=
github.com/TecharoHQ/thoth-proto v0.4.0 h1:UbkvfgCku0Dm1R6O4ug3HOsJNnE6F3wB8x+Dpw2lzFI=
github.com/TecharoHQ/thoth-proto v0.4.0/go.mod h1:IcGnZt3iYUZQVEa0Lwk5l4ix0hCeXlWUV1TJMZvbWx0=
github.com/TecharoHQ/yeet v0.6.0 h1:RCBAjr7wIlllsgy0tpvWpLX7jsZgu2tiuBY3RrprcR0=
github.com/TecharoHQ/yeet v0.6.0/go.mod h1:bj2V4Fg8qKQXoiuPZa3HuawrE8g+LsOQv/9q2WyGSsA=
github.com/a-h/parse v0.0.0-20250122154542-74294addb73e h1:HjVbSQHy+dnlS6C3XajZ69NYAb5jbGNfHanvm1+iYlo=
github.com/a-h/parse v0.0.0-20250122154542-74294addb73e/go.mod h1:3mnrkvGpurZ4ZrTDbYU84xhwXW2TjTKShSwjRi2ihfQ=
github.com/a-h/templ v0.3.865 h1:nYn5EWm9EiXaDgWcMQaKiKvrydqgxDUtT1+4zU2C43A=
github.com/a-h/templ v0.3.865/go.mod h1:oLBbZVQ6//Q6zpvSMPTuBK0F3qOtBdFBcGRspcT+VNQ=
github.com/a-h/templ v0.3.898 h1:g9oxL/dmM6tvwRe2egJS8hBDQTncokbMoOFk1oJMX7s=
github.com/a-h/templ v0.3.898/go.mod h1:oLBbZVQ6//Q6zpvSMPTuBK0F3qOtBdFBcGRspcT+VNQ=
github.com/andybalholm/brotli v1.1.0 h1:eLKJA0d02Lf0mVpIDgYnqXcUn0GqVmEFny3VuID1U3M=
github.com/andybalholm/brotli v1.1.0/go.mod h1:sms7XGricyQI9K10gOSf56VKKWS4oLer58Q+mhRPtnY=
github.com/anmitsu/go-shlex v0.0.0-20200514113438-38f4b401e2be h1:9AeTilPcZAjCFIImctFaOjnTIavg87rW78vTPkQqLI8=
@@ -69,16 +67,18 @@ github.com/cli/go-gh v0.1.0 h1:kMqFmC3ECBrV2UKzlOHjNOTTchExVc5tjNHtCqk/zYk=
github.com/cli/go-gh v0.1.0/go.mod h1:eTGWl99EMZ+3Iau5C6dHyGAJRRia65MtdBtuhWc+84o=
github.com/cli/safeexec v1.0.0/go.mod h1:Z/D4tTN8Vs5gXYHDCbaM1S/anmEDnJb1iW0+EJ5zx3Q=
github.com/cli/shurcooL-graphql v0.0.1/go.mod h1:U7gCSuMZP/Qy7kbqkk5PrqXEeDgtfG5K+W+u8weorps=
github.com/cloudflare/circl v1.6.0 h1:cr5JKic4HI+LkINy2lg3W2jF8sHCVTBncJr5gIIq7qk=
github.com/cloudflare/circl v1.6.0/go.mod h1:uddAzsPgqdMAYatqJ0lsjX1oECcQLIlRpzZh3pJrofs=
github.com/cloudflare/circl v1.6.1 h1:zqIqSPIndyBh1bjLVVDHMPpVKqp8Su/V+6MeDzzQBQ0=
github.com/cloudflare/circl v1.6.1/go.mod h1:uddAzsPgqdMAYatqJ0lsjX1oECcQLIlRpzZh3pJrofs=
github.com/creack/pty v1.1.24 h1:bJrF4RRfyJnbTJqzRLHzcGaZK1NeM5kTC9jGgovnR1s=
github.com/creack/pty v1.1.24/go.mod h1:08sCNb52WyoAwi2QDyzUCTgcvVFhUzewun7wtTfvcwE=
github.com/cyphar/filepath-securejoin v0.4.1 h1:JyxxyPEaktOD+GAnqIqTf9A8tHyAG22rowi7HkoSU1s=
github.com/cyphar/filepath-securejoin v0.4.1/go.mod h1:Sdj7gXlvMcPZsbhwhQ33GguGLDGQL7h7bg04C/+u9jI=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1VwoXQT9A3Wy9MM3WgvqSxFWenqJduM=
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/deckarep/golang-set/v2 v2.7.0 h1:gIloKvD7yH2oip4VLhsv3JyLLFnC0Y2mlusgcvJYW5k=
github.com/deckarep/golang-set/v2 v2.7.0/go.mod h1:VAky9rY/yGXJOLEDv3OMci+7wtDpOF4IN+y82NBOac4=
github.com/deckarep/golang-set/v2 v2.8.0 h1:swm0rlPCmdWn9mESxKOjWk8hXSqoxOp+ZlfuyaAdFlQ=
github.com/deckarep/golang-set/v2 v2.8.0/go.mod h1:VAky9rY/yGXJOLEDv3OMci+7wtDpOF4IN+y82NBOac4=
github.com/dlclark/regexp2 v1.11.4 h1:rPYF9/LECdNymJufQKmri9gV604RvvABwgOA8un7yAo=
github.com/dlclark/regexp2 v1.11.4/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8=
github.com/dop251/goja v0.0.0-20250309171923-bcd7cc6bf64c h1:mxWGS0YyquJ/ikZOjSrRjjFIbUqIP9ojyYQ+QZTU3Rg=
@@ -103,6 +103,8 @@ github.com/frankban/quicktest v1.14.6 h1:7Xjx+VpznH+oBnejlPUj8oUpdxnVs4f8XU8WnHk
github.com/frankban/quicktest v1.14.6/go.mod h1:4ptaffx2x8+WTWXmUCuVU6aPUX1/Mz7zb5vbUoiM6w0=
github.com/fsnotify/fsnotify v1.8.0 h1:dAwr6QBTBZIkG8roQaJjGof0pp0EeF+tNV7YBP3F/8M=
github.com/fsnotify/fsnotify v1.8.0/go.mod h1:8jBTzvmWwFyi3Pb8djgCCO5IBqzKJ/Jwo8TRcHyHii0=
github.com/gaissmai/bart v0.20.4 h1:Ik47r1fy3jRVU+1eYzKSW3ho2UgBVTVnUS8O993584U=
github.com/gaissmai/bart v0.20.4/go.mod h1:cEed+ge8dalcbpi8wtS9x9m2hn/fNJH5suhdGQOHnYk=
github.com/gliderlabs/ssh v0.3.8 h1:a4YXD1V7xMF9g5nTkdfnja3Sxy1PVDCj1Zg4Wb8vY6c=
github.com/gliderlabs/ssh v0.3.8/go.mod h1:xYoytBv1sV0aL3CavoDuJIQNURXkkfPA/wxQ1pL1fAU=
github.com/go-git/gcfg v1.5.1-0.20230307220236-3a3c6141e376 h1:+zs/tPmkDkHx3U66DAb0lQFJrpS6731Oaa12ikc+DiI=
@@ -115,6 +117,10 @@ github.com/go-git/go-git/v5 v5.14.0 h1:/MD3lCrGjCen5WfEAzKg00MJJffKhC8gzS80ycmCi
github.com/go-git/go-git/v5 v5.14.0/go.mod h1:Z5Xhoia5PcWA3NF8vRLURn9E5FRhSl7dGj9ItW3Wk5k=
github.com/go-jose/go-jose/v3 v3.0.4 h1:Wp5HA7bLQcKnf6YYao/4kpRpVMp/yf6+pJKV8WFSaNY=
github.com/go-jose/go-jose/v3 v3.0.4/go.mod h1:5b+7YgP7ZICgJDBdfjZaIt+H/9L9T/YQrVfLAMboGkQ=
github.com/go-logr/logr v1.4.2 h1:6pFjapn8bFcIbiKo3XT4j/BhANplGihG6tvd+8rYgrY=
github.com/go-logr/logr v1.4.2/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag=
github.com/go-logr/stdr v1.2.2/go.mod h1:mMo/vtBO5dYbehREoey6XUKy/eSumjCCveDpRre4VKE=
github.com/go-playground/assert/v2 v2.0.1/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4=
github.com/go-playground/locales v0.13.0 h1:HyWk6mgj5qFqCT5fjGBuRArbVDfE4hi8+e8ceBS/t7Q=
github.com/go-playground/locales v0.13.0/go.mod h1:taPMhCMXrRLJO55olJkUXHZBHCxTMfnGwq/HNwmWNS8=
@@ -123,6 +129,8 @@ github.com/go-playground/universal-translator v0.17.0/go.mod h1:UkSxE5sNxxRwHyU+
github.com/go-playground/validator/v10 v10.4.1/go.mod h1:nlOn6nFhuKACm19sB/8EGNn9GlaMV7XkbRSipzJ0Ii4=
github.com/go-playground/validator/v10 v10.10.0 h1:I7mrTYv78z8k8VXa/qJlOlEXn/nBh+BF8dHX5nt/dr0=
github.com/go-playground/validator/v10 v10.10.0/go.mod h1:74x4gJWsvQexRdW8Pn3dXSGrTK4nAUsbPlLADvpJkos=
github.com/go-quicktest/qt v1.101.0 h1:O1K29Txy5P2OK0dGo59b7b0LR6wKfIhttaAhHUyn7eI=
github.com/go-quicktest/qt v1.101.0/go.mod h1:14Bz/f7NwaXPtdYEgzsx46kqSxVwTbzVZsDC26tQJow=
github.com/go-sourcemap/sourcemap v2.1.3+incompatible h1:W1iEw64niKVGogNgBN3ePyLFfuisuzeidWPMPWmECqU=
github.com/go-sourcemap/sourcemap v2.1.3+incompatible/go.mod h1:F8jJfvm2KbVjc5NqelyYJmf/v5J0dwNLS2mL4sNA1Jg=
github.com/go-stack/stack v1.8.1 h1:ntEHSVwIt7PNXNpgPmVfMrNhLtgjlmnZha2kOpuRiDw=
@@ -136,6 +144,8 @@ github.com/golang-jwt/jwt/v5 v5.2.2 h1:Rl4B7itRWVtYIHFrSNd7vhTiz9UpLdi6gZhZ3wEeD
github.com/golang-jwt/jwt/v5 v5.2.2/go.mod h1:pqrtFR0X4osieyHYxtmOUWsAWrfe1Q5UVIyoH402zdk=
github.com/golang/groupcache v0.0.0-20241129210726-2c02b8208cf8 h1:f+oWsMOmNPc8JmEHVZIycC7hBoQxHH9pNKQORJNozsQ=
github.com/golang/groupcache v0.0.0-20241129210726-2c02b8208cf8/go.mod h1:wcDNUvekVysuuOpQKo3191zZyTpiI6se1N1ULghS0sw=
github.com/golang/protobuf v1.5.4 h1:i7eJL8qZTpSEXOPTxNKhASYpMn+8e5Q6AdndVa1dWek=
github.com/golang/protobuf v1.5.4/go.mod h1:lnTiLA8Wa4RWRcIUkrtSVa5nRhsEGBg48fD6rSs7xps=
github.com/google/cel-go v0.25.0 h1:jsFw9Fhn+3y2kBbltZR4VEz5xKkcIFRPDnuEzAGv5GY=
github.com/google/cel-go v0.25.0/go.mod h1:hjEb6r5SuOSlhCHmFoLzu8HGCERvIsDAbxDAyNU/MmI=
github.com/google/go-cmdtest v0.4.1-0.20220921163831-55ab3332a786 h1:rcv+Ippz6RAtvaGgKxc+8FQIpxHgsF+HBzPyYL2cyVU=
@@ -147,8 +157,6 @@ github.com/google/pprof v0.0.0-20230207041349-798e818bf904 h1:4/hN5RUoecvl+RmJRE
github.com/google/pprof v0.0.0-20230207041349-798e818bf904/go.mod h1:uglQLonpP8qtYCYyzA+8c/9qtqgA3qsXGYqCPKARAFg=
github.com/google/renameio v0.1.0 h1:GOZbcHa3HfsPKPlmyPyN2KEohoMXOhdMbHrvbpl2QaA=
github.com/google/renameio v0.1.0/go.mod h1:KWCgfxg9yswjAJkECMjeO8J8rahYeXnNhOm40UhjYkI=
github.com/google/rpmpack v0.6.1-0.20240329070804-c2247cbb881a h1:JJBdjSfqSy3mnDT0940ASQFghwcZ4y4cb6ttjAoXqwE=
github.com/google/rpmpack v0.6.1-0.20240329070804-c2247cbb881a/go.mod h1:uqVAUVQLq8UY2hCDfmJ/+rtO3aw7qyhc90rCVEabEfI=
github.com/google/rpmpack v0.6.1-0.20250405124433-758cc6896cbc h1:qES+d3PvR9CN+zARQQH/bNXH0ybzmdjNMHICrBwXD28=
github.com/google/rpmpack v0.6.1-0.20250405124433-758cc6896cbc/go.mod h1:uqVAUVQLq8UY2hCDfmJ/+rtO3aw7qyhc90rCVEabEfI=
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510 h1:El6M4kTTCOh6aBiKaUGG7oYTSPP8MxqL4YI3kZKwcP4=
@@ -161,16 +169,20 @@ github.com/goreleaser/chglog v0.7.0 h1:/KzXWAeg4DrEz4r3OI6K2Yb8RAsVGeInCUfLWFXL9
github.com/goreleaser/chglog v0.7.0/go.mod h1:2h/yyq9xvTUeM9tOoucBP+jri8Dj28splx+SjlYkklc=
github.com/goreleaser/fileglob v1.3.0 h1:/X6J7U8lbDpQtBvGcwwPS6OpzkNVlVEsFUVRx9+k+7I=
github.com/goreleaser/fileglob v1.3.0/go.mod h1:Jx6BoXv3mbYkEzwm9THo7xbr5egkAraxkGorbJb4RxU=
github.com/goreleaser/nfpm/v2 v2.42.0 h1:7BW4WQWyvZDrT0C7SyWop+J8rtqFyTB17Sb2/j/NxMI=
github.com/goreleaser/nfpm/v2 v2.42.0/go.mod h1:DtNL+nKpfB8sMFZp+X7Xu3W64atyZYtTnYe8O925/mg=
github.com/goreleaser/nfpm/v2 v2.42.1 h1:xu2pLRgQuz2ab+YZFoeIzwU/M5jjjCKDGwv1lRbVGvk=
github.com/goreleaser/nfpm/v2 v2.42.1/go.mod h1:dY53KWYKebkOocxgkmpM7SRX0Nv5hU+jEu2kIaM4/LI=
github.com/grpc-ecosystem/go-grpc-middleware/providers/prometheus v1.0.1 h1:qnpSQwGEnkcRpTqNOIR6bJbR0gAorgP9CSALpRcKoAA=
github.com/grpc-ecosystem/go-grpc-middleware/providers/prometheus v1.0.1/go.mod h1:lXGCsh6c22WGtjr+qGHj1otzZpV/1kwTMAqkwZsnWRU=
github.com/grpc-ecosystem/go-grpc-middleware/v2 v2.1.0 h1:pRhl55Yx1eC7BZ1N+BBWwnKaMyD8uC+34TLdndZMAKk=
github.com/grpc-ecosystem/go-grpc-middleware/v2 v2.1.0/go.mod h1:XKMd7iuf/RGPSMJ/U4HP0zS2Z9Fh8Ps9a+6X26m/tmI=
github.com/h2non/parth v0.0.0-20190131123155-b4df798d6542/go.mod h1:Ow0tF8D4Kplbc8s8sSb3V2oUCygFHVp8gC3Dn6U4MNI=
github.com/henvic/httpretty v0.0.6/go.mod h1:X38wLjWXHkXT7r2+uK8LjCMne9rsuNaBLJ+5cU2/Pmo=
github.com/huandu/xstrings v1.5.0 h1:2ag3IFq9ZDANvthTwTiqSSZLjDc+BedvHPAp5tJy2TI=
github.com/huandu/xstrings v1.5.0/go.mod h1:y5/lhBue+AyNmUVz9RLU9xbLR0o4KIIExikq4ovT0aE=
github.com/jbenet/go-context v0.0.0-20150711004518-d14ea06fba99 h1:BQSFePA1RWJOlocH6Fxy8MmwDt+yVQYULKfN0RoTN8A=
github.com/jbenet/go-context v0.0.0-20150711004518-d14ea06fba99/go.mod h1:1lJo3i6rXxKeerYnT8Nvf0QmHCRC1n8sfWVwXF2Frvo=
github.com/joho/godotenv v1.5.1 h1:7eLL/+HRGLY0ldzfGMeQkb7vMd0as4CfYvUVzLqw0N0=
github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwAbqwq4=
github.com/jtolds/gls v4.20.0+incompatible h1:xdiiI2gbIgH/gLH7ADydsJ1uDOEzR8yvV7C0MuV77Wo=
github.com/jtolds/gls v4.20.0+incompatible/go.mod h1:QJZ7F/aHp+rZTRtaJ1ow/lLfFfVYBRgL+9YlvaHOwJU=
github.com/kevinburke/ssh_config v1.2.0 h1:x584FjTGwHzMwvHx18PXxbBVzfnxogHaAReU4gf13a4=
@@ -254,13 +266,17 @@ github.com/smartystreets/goconvey v1.8.1 h1:qGjIddxOk4grTu9JPOU31tVfq3cNdBlNa5sS
github.com/smartystreets/goconvey v1.8.1/go.mod h1:+/u4qLyY6x1jReYOp7GOM2FSt8aP9CzCZL03bI28W60=
github.com/spf13/cast v1.7.1 h1:cuNEagBQEHWN1FnbGEjCXL2szYEXqfJPbP2HNUaca9Y=
github.com/spf13/cast v1.7.1/go.mod h1:ancEpBxwJDODSW/UG4rDrAqiKolqNNh2DX3mk86cAdo=
github.com/stoewer/go-strcase v1.2.0 h1:Z2iHWqGXH00XYgqDmNgQbIBxf3wrNq0F3feEy0ainaU=
github.com/stoewer/go-strcase v1.2.0/go.mod h1:IBiWB2sKIp3wVVQ3Y035++gc+knqhUQag1KpM8ahLw8=
github.com/stoewer/go-strcase v1.3.0 h1:g0eASXYtp+yvN9fK8sH94oCIk0fau9uV1/ZdJ0AVEzs=
github.com/stoewer/go-strcase v1.3.0/go.mod h1:fAH5hQ5pehh+j3nZfvwdk2RgEgQjAoM8wodgtPmh1xo=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo=
github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4=
github.com/stretchr/testify v1.5.1/go.mod h1:5W2xD1RspED5o8YsWQXVCued0rvSQ+mT+I5cxcmMvtA=
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOfJA=
github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
github.com/thlib/go-timezone-local v0.0.0-20210907160436-ef149e42d28e/go.mod h1:/Tnicc6m/lsJE0irFMA0LfIwTBo4QP7A8IfyIv4zZKI=
@@ -270,26 +286,36 @@ github.com/xanzy/ssh-agent v0.3.3 h1:+/15pJfg/RsTxqYcX6fHqOXZwwMP+2VyYWJeWM2qQFM
github.com/xanzy/ssh-agent v0.3.3/go.mod h1:6dzNDKs0J9rVPHPhaGCukekBHKqfl+L3KghI1Bc68Uw=
github.com/xi2/xz v0.0.0-20171230120015-48954b6210f8 h1:nIPpBwaJSVYIxUFsDv3M8ofmx9yWTog9BfvIu0q41lo=
github.com/xi2/xz v0.0.0-20171230120015-48954b6210f8/go.mod h1:HUYIGzjTL3rfEspMxjDjgmT5uz5wzYJKVo23qUhYTos=
github.com/yl2chen/cidranger v1.0.2 h1:lbOWZVCG1tCRX4u24kuM1Tb4nHqWkDxwLdoS+SevawU=
github.com/yl2chen/cidranger v1.0.2/go.mod h1:9U1yz7WPYDwf0vpNWFaeRh0bjwz5RVgRy/9UEQfHl0g=
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
gitlab.com/digitalxero/go-conventional-commit v1.0.7 h1:8/dO6WWG+98PMhlZowt/YjuiKhqhGlOCwlIV8SqqGh8=
gitlab.com/digitalxero/go-conventional-commit v1.0.7/go.mod h1:05Xc2BFsSyC5tKhK0y+P3bs0AwUtNuTp+mTpbCU/DZ0=
go.opentelemetry.io/auto/sdk v1.1.0 h1:cH53jehLUN6UFLY71z+NDOiNJqDdPRaXzTel0sJySYA=
go.opentelemetry.io/auto/sdk v1.1.0/go.mod h1:3wSPjt5PWp2RhlCcmmOial7AvC4DQqZb7a7wCow3W8A=
go.opentelemetry.io/otel v1.34.0 h1:zRLXxLCgL1WyKsPVrgbSdMN4c0FMkDAskSTQP+0hdUY=
go.opentelemetry.io/otel v1.34.0/go.mod h1:OWFPOQ+h4G8xpyjgqo4SxJYdDQ/qmRH+wivy7zzx9oI=
go.opentelemetry.io/otel/metric v1.34.0 h1:+eTR3U0MyfWjRDhmFMxe2SsW64QrZ84AOhvqS7Y+PoQ=
go.opentelemetry.io/otel/metric v1.34.0/go.mod h1:CEDrp0fy2D0MvkXE+dPV7cMi8tWZwX3dmaIhwPOaqHE=
go.opentelemetry.io/otel/sdk v1.34.0 h1:95zS4k/2GOy069d321O8jWgYsW3MzVV+KuSPKp7Wr1A=
go.opentelemetry.io/otel/sdk v1.34.0/go.mod h1:0e/pNiaMAqaykJGKbi+tSjWfNNHMTxoC9qANsCzbyxU=
go.opentelemetry.io/otel/sdk/metric v1.34.0 h1:5CeK9ujjbFVL5c1PhLuStg1wxA7vQv7ce1EK0Gyvahk=
go.opentelemetry.io/otel/sdk/metric v1.34.0/go.mod h1:jQ/r8Ze28zRKoNRdkjCZxfs6YvBTG1+YIqyFVFYec5w=
go.opentelemetry.io/otel/trace v1.34.0 h1:+ouXS2V8Rd4hp4580a8q23bg0azF2nI8cqLYnC8mh/k=
go.opentelemetry.io/otel/trace v1.34.0/go.mod h1:Svm7lSjQD7kG7KJ/MUHPVXSDGz2OX4h0M2jHBhmSfRE=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.0.0-20220622213112-05595931fe9d/go.mod h1:IxCIyHEi3zRg3s0A5j5BB6A9Jmi73HwBIUl50j+osU4=
golang.org/x/crypto v0.19.0/go.mod h1:Iy9bg/ha4yyC70EfRS8jz+B6ybOBKMaSxLj6P6oBDfU=
golang.org/x/crypto v0.38.0 h1:jt+WWG8IZlBnVbomuhg2Mdq0+BBQaHbtqHEFEigjUV8=
golang.org/x/crypto v0.38.0/go.mod h1:MvrbAqul58NNYPKnOra203SB9vpuZW0e+RRZV+Ggqjw=
golang.org/x/crypto v0.39.0 h1:SHs+kF4LP+f+p14esP5jAoDpHU8Gu/v9lFRK6IT5imM=
golang.org/x/crypto v0.39.0/go.mod h1:L+Xg3Wf6HoL4Bn4238Z6ft6KfEpN0tJGo53AAPC632U=
golang.org/x/exp v0.0.0-20240719175910-8a7402abbf56 h1:2dVuKD2vS7b0QIHQbpyTISPd0LeHDbnYEryqj5Q1ug8=
golang.org/x/exp v0.0.0-20240719175910-8a7402abbf56/go.mod h1:M4RDyNAINzryxdtnbRXRL/OHtkFuWGRjvuhBJpk2IlY=
golang.org/x/exp/typeparams v0.0.0-20231108232855-2478ac86f678 h1:1P7xPZEwZMoBoz0Yze5Nx2/4pxj6nw9ZqHWXqP0iRgQ=
golang.org/x/exp/typeparams v0.0.0-20231108232855-2478ac86f678/go.mod h1:AbB0pIl9nAr9wVwH+Z2ZpaocVmF5I4GyWCDIsVjR0bk=
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
golang.org/x/mod v0.8.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.24.0 h1:ZfthKaKaT4NrhGVZHO1/WDTwGES4De8KtWO0SIbNJMU=
golang.org/x/mod v0.24.0/go.mod h1:IXM97Txy2VM4PJ3gI61r1YEk/gAj6zAHN3AdZt6S9Ww=
golang.org/x/mod v0.25.0 h1:n7a+ZbQKQA/Ysbyb0/6IbB1H/X41mKgbhfv7AfG/44w=
golang.org/x/mod v0.25.0/go.mod h1:IXM97Txy2VM4PJ3gI61r1YEk/gAj6zAHN3AdZt6S9Ww=
golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
@@ -297,13 +323,13 @@ golang.org/x/net v0.0.0-20211112202133-69e39bad7dc2/go.mod h1:9nx3DQGgdP8bBQD5qx
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
golang.org/x/net v0.10.0/go.mod h1:0qNGK6F8kojg2nk9dLZ2mShWaEBan6FAoqfSigmmuDg=
golang.org/x/net v0.40.0 h1:79Xs7wF06Gbdcg4kdCCIQArK11Z1hr5POQ6+fIYHNuY=
golang.org/x/net v0.40.0/go.mod h1:y0hY0exeL2Pku80/zKK7tpntoX23cqL3Oa6njdgRtds=
golang.org/x/net v0.41.0 h1:vBTly1HeNPEn3wtREYfy4GZ/NECgw2Cnl+nK6Nz3uvw=
golang.org/x/net v0.41.0/go.mod h1:B/K4NNqkfmg07DQYrbwvSluqCJOOXwUjeb/5lOisjbA=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.14.0 h1:woo0S4Yywslg6hp4eUFjTVOyKt0RookbpAHG4c1HmhQ=
golang.org/x/sync v0.14.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
golang.org/x/sync v0.15.0 h1:KWH3jNZsfyT6xfAfKiz6MRNmd46ByHDYaZ7KSkCtdW8=
golang.org/x/sync v0.15.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191026070338-33540a1f6037/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
@@ -344,14 +370,14 @@ golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8=
golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.25.0 h1:qVyWApTSYLk/drJRO5mDlNYskwQznZmkpV2c8q9zls4=
golang.org/x/text v0.25.0/go.mod h1:WEdwpYrmk1qmdHvhkSTNPm3app7v4rsT8F2UD6+VHIA=
golang.org/x/text v0.26.0 h1:P42AVeLghgTYr4+xUnTRKDMqpar+PtX7KWuNQL21L8M=
golang.org/x/text v0.26.0/go.mod h1:QK15LZJUUQVJxhz7wXgxSy/CJaTFjd0G+YLonydOVQA=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
golang.org/x/tools v0.6.0/go.mod h1:Xwgl3UAJ/d3gWutnCtw505GrjyAbvKui8lOU390QaIU=
golang.org/x/tools v0.32.0 h1:Q7N1vhpkQv7ybVzLFtTjvQya2ewbwNDZzUgfXGqtMWU=
golang.org/x/tools v0.32.0/go.mod h1:ZxrU41P/wAbZD8EDa6dDCa6XfpkhJ7HFMjHJXfBDu8s=
golang.org/x/tools v0.33.0 h1:4qz2S3zmRxbGIhDIAgjxvFutSvH5EfnsYrRBj0UI0bc=
golang.org/x/tools v0.33.0/go.mod h1:CIJMaWEY88juyUfo7UbgPqbC8rU2OqfAV1h2Qp0oMYI=
golang.org/x/vuln v1.1.4 h1:Ju8QsuyhX3Hk8ma3CesTbO8vfJD9EvUBgHvkxHBzj0I=
golang.org/x/vuln v1.1.4/go.mod h1:F+45wmU18ym/ca5PLTPLsSzr2KppzswxPP603ldA67s=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
@@ -359,12 +385,14 @@ golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8T
golang.org/x/xerrors v0.0.0-20220609144429-65e65417b02f/go.mod h1:K8+ghG5WaK9qNqU5K3HdILfMLy1f3aNYFI/wnl100a8=
golang.org/x/xerrors v0.0.0-20240716161551-93cc26a95ae9 h1:LLhsEBxRTBLuKlQxFBYUOU8xyFgXv6cOTp2HASDlsDk=
golang.org/x/xerrors v0.0.0-20240716161551-93cc26a95ae9/go.mod h1:NDW/Ps6MPRej6fsCIbMTohpP40sJ/P/vI1MoTEGwX90=
google.golang.org/genproto/googleapis/api v0.0.0-20240826202546-f6391c0de4c7 h1:YcyjlL1PRr2Q17/I0dPk2JmYS5CDXfcdb2Z3YRioEbw=
google.golang.org/genproto/googleapis/api v0.0.0-20240826202546-f6391c0de4c7/go.mod h1:OCdP9MfskevB/rbYvHTsXTtKC+3bHWajPdoKgjcYkfo=
google.golang.org/genproto/googleapis/rpc v0.0.0-20240826202546-f6391c0de4c7 h1:2035KHhUv+EpyB+hWgJnaWKJOdX1E95w2S8Rr4uWKTs=
google.golang.org/genproto/googleapis/rpc v0.0.0-20240826202546-f6391c0de4c7/go.mod h1:UqMtugtsSgubUsoxbuAoiCXvqvErP7Gf0so0mK9tHxU=
google.golang.org/protobuf v1.36.5 h1:tPhr+woSbjfYvY6/GPufUoYizxw1cF/yFoxJ2fmpwlM=
google.golang.org/protobuf v1.36.5/go.mod h1:9fA7Ob0pmnwhb644+1+CVWFRbNajQ6iRojtC/QF5bRE=
google.golang.org/genproto/googleapis/api v0.0.0-20250218202821-56aae31c358a h1:nwKuGPlUAt+aR+pcrkfFRrTU1BVrSmYyYMxYbUIVHr0=
google.golang.org/genproto/googleapis/api v0.0.0-20250218202821-56aae31c358a/go.mod h1:3kWAYMk1I75K4vykHtKt2ycnOgpA6974V7bREqbsenU=
google.golang.org/genproto/googleapis/rpc v0.0.0-20250218202821-56aae31c358a h1:51aaUVRocpvUOSQKM6Q7VuoaktNIaMCLuhZB6DKksq4=
google.golang.org/genproto/googleapis/rpc v0.0.0-20250218202821-56aae31c358a/go.mod h1:uRxBH1mhmO8PGhU89cMcHaXKZqO+OfakD8QQO0oYwlQ=
google.golang.org/grpc v1.72.2 h1:TdbGzwb82ty4OusHWepvFWGLgIbNo1/SUynEN0ssqv8=
google.golang.org/grpc v1.72.2/go.mod h1:wH5Aktxcg25y1I3w7H69nHfXdOG3UiadoBtjh3izSDM=
google.golang.org/protobuf v1.36.6 h1:z1NpPI8ku2WgiWnf+t9wTPsn6eP1L7ksHUlkfLvd9xY=
google.golang.org/protobuf v1.36.6/go.mod h1:jduwjTPXsFjZGTmRluh+L6NjiWu7pchiJ2/5YcXBHnY=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=

View File

@@ -3,10 +3,23 @@ package internal
import (
"crypto/sha256"
"encoding/hex"
"strconv"
"github.com/cespare/xxhash/v2"
)
// SHA256sum computes a cryptographic hash. Still used for proof-of-work challenges
// where we need the security properties of a cryptographic hash function.
func SHA256sum(text string) string {
hash := sha256.New()
hash.Write([]byte(text))
return hex.EncodeToString(hash.Sum(nil))
}
// FastHash is a high-performance non-cryptographic hash function suitable for
// internal caching, policy rule identification, and other performance-critical
// use cases where cryptographic security is not required.
func FastHash(text string) string {
h := xxhash.Sum64String(text)
return strconv.FormatUint(h, 16)
}

261
internal/hash_bench_test.go Normal file
View File

@@ -0,0 +1,261 @@
package internal
import (
"fmt"
"strings"
"testing"
)
// XXHash64sum is a test alias for FastHash to benchmark against SHA256
func XXHash64sum(text string) string {
return FastHash(text)
}
// Test data that matches real usage patterns in the codebase
var (
// Typical policy checker inputs
policyInputs = []string{
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"User-Agent: bot/1.0",
"User-Agent: GoogleBot/2.1",
"/robots.txt",
"/api/.*",
"10.0.0.0/8",
"192.168.1.0/24",
"172.16.0.0/12",
}
// Challenge data from challengeFor function
challengeInputs = []string{
"Accept-Language=en-US,X-Real-IP=192.168.1.100,User-Agent=Mozilla/5.0,WeekTime=2025-06-16T00:00:00Z,Fingerprint=abc123,Difficulty=5",
"Accept-Language=fr-FR,X-Real-IP=10.0.0.50,User-Agent=Chrome/91.0,WeekTime=2025-06-16T00:00:00Z,Fingerprint=def456,Difficulty=3",
"Accept-Language=es-ES,X-Real-IP=172.16.1.1,User-Agent=Safari/14.0,WeekTime=2025-06-16T00:00:00Z,Fingerprint=ghi789,Difficulty=7",
}
// Bot rule patterns
botRuleInputs = []string{
"GoogleBot::path:/robots.txt",
"BingBot::useragent:Mozilla/5.0 (compatible; bingbot/2.0)",
"FacebookBot::headers:Accept-Language,User-Agent",
"TwitterBot::cidr:192.168.1.0/24",
}
// CEL expressions from policy rules
celInputs = []string{
`request.headers["User-Agent"].contains("bot")`,
`request.path.startsWith("/api/") && request.method == "POST"`,
`request.remoteAddress in ["192.168.1.0/24", "10.0.0.0/8"]`,
`request.userAgent.matches(".*[Bb]ot.*") || request.userAgent.matches(".*[Cc]rawler.*")`,
}
// Thoth ASN checker inputs
asnInputs = []string{
"ASNChecker\nAS 15169\nAS 8075\nAS 32934",
"ASNChecker\nAS 13335\nAS 16509\nAS 14061",
"ASNChecker\nAS 36351\nAS 20940\nAS 8100",
}
)
func BenchmarkSHA256_PolicyInputs(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
input := policyInputs[i%len(policyInputs)]
_ = SHA256sum(input)
}
}
func BenchmarkXXHash_PolicyInputs(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
input := policyInputs[i%len(policyInputs)]
_ = XXHash64sum(input)
}
}
func BenchmarkSHA256_ChallengeInputs(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
input := challengeInputs[i%len(challengeInputs)]
_ = SHA256sum(input)
}
}
func BenchmarkXXHash_ChallengeInputs(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
input := challengeInputs[i%len(challengeInputs)]
_ = XXHash64sum(input)
}
}
func BenchmarkSHA256_BotRuleInputs(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
input := botRuleInputs[i%len(botRuleInputs)]
_ = SHA256sum(input)
}
}
func BenchmarkXXHash_BotRuleInputs(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
input := botRuleInputs[i%len(botRuleInputs)]
_ = XXHash64sum(input)
}
}
func BenchmarkSHA256_CELInputs(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
input := celInputs[i%len(celInputs)]
_ = SHA256sum(input)
}
}
func BenchmarkXXHash_CELInputs(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
input := celInputs[i%len(celInputs)]
_ = XXHash64sum(input)
}
}
func BenchmarkSHA256_ASNInputs(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
input := asnInputs[i%len(asnInputs)]
_ = SHA256sum(input)
}
}
func BenchmarkXXHash_ASNInputs(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
input := asnInputs[i%len(asnInputs)]
_ = XXHash64sum(input)
}
}
// Benchmark the policy list hashing used in checker.go
func BenchmarkSHA256_PolicyList(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
var sb strings.Builder
for _, input := range policyInputs {
fmt.Fprintln(&sb, SHA256sum(input))
}
_ = SHA256sum(sb.String())
}
}
func BenchmarkXXHash_PolicyList(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
var sb strings.Builder
for _, input := range policyInputs {
fmt.Fprintln(&sb, XXHash64sum(input))
}
_ = XXHash64sum(sb.String())
}
}
// Tests that xxhash doesn't have collisions in realistic scenarios
func TestHashCollisions(t *testing.T) {
allInputs := append(append(append(append(policyInputs, challengeInputs...), botRuleInputs...), celInputs...), asnInputs...)
// Start with realistic inputs from actual usage
xxhashHashes := make(map[string]string)
for _, input := range allInputs {
hash := XXHash64sum(input)
if existing, exists := xxhashHashes[hash]; exists {
t.Errorf("XXHash collision detected: %q and %q both hash to %s", input, existing, hash)
}
xxhashHashes[hash] = input
}
t.Logf("Basic test: %d realistic inputs, no collisions", len(allInputs))
// Test similar strings that might cause hash collisions
prefixes := []string{"User-Agent: ", "X-Real-IP: ", "Accept-Language: ", "Host: "}
suffixes := []string{"bot", "crawler", "spider", "scraper", "Mozilla", "Chrome", "Safari", "Firefox"}
variations := []string{"", "/1.0", "/2.0", " (compatible)", " (Windows)", " (Linux)", " (Mac)"}
stressCount := 0
for _, prefix := range prefixes {
for _, suffix := range suffixes {
for _, variation := range variations {
for i := 0; i < 100; i++ {
input := fmt.Sprintf("%s%s%s-%d", prefix, suffix, variation, i)
hash := XXHash64sum(input)
if existing, exists := xxhashHashes[hash]; exists {
t.Errorf("XXHash collision in stress test: %q and %q both hash to %s", input, existing, hash)
}
xxhashHashes[hash] = input
stressCount++
}
}
}
}
t.Logf("Stress test 1: %d similar string variations, no collisions", stressCount)
// Test sequential patterns that might be problematic
patterns := []string{
"192.168.1.%d",
"10.0.0.%d",
"172.16.%d.1",
"challenge-%d",
"bot-rule-%d",
"policy-%016x",
"session-%016x",
}
seqCount := 0
for _, pattern := range patterns {
for i := 0; i < 10000; i++ {
input := fmt.Sprintf(pattern, i)
hash := XXHash64sum(input)
if existing, exists := xxhashHashes[hash]; exists {
t.Errorf("XXHash collision in sequential test: %q and %q both hash to %s", input, existing, hash)
}
xxhashHashes[hash] = input
seqCount++
}
}
t.Logf("Stress test 2: %d sequential patterns, no collisions", seqCount)
totalInputs := len(allInputs) + stressCount + seqCount
t.Logf("TOTAL: Tested %d inputs across realistic scenarios - NO COLLISIONS", totalInputs)
}
// Verify xxhash output works as cache keys
func TestXXHashFormat(t *testing.T) {
testCases := []string{
"short",
"",
"very long string with lots of content that might be used in policy checking and other internal hashing scenarios",
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
}
for _, input := range testCases {
hash := XXHash64sum(input)
// Check it's valid hex
if len(hash) == 0 {
t.Errorf("Empty hash for input %q", input)
}
// xxhash is 64-bit so max 16 hex chars
if len(hash) > 16 {
t.Errorf("Hash too long for input %q: %s (length %d)", input, hash, len(hash))
}
// Make sure it's all hex characters
for _, char := range hash {
if !((char >= '0' && char <= '9') || (char >= 'a' && char <= 'f')) {
t.Errorf("Non-hex character %c in hash %s for input %q", char, hash, input)
}
}
t.Logf("Input: %q -> Hash: %s", input, hash)
}
}

View File

@@ -81,12 +81,12 @@ func XForwardedForToXRealIP(next http.Handler) http.Handler {
// XForwardedForUpdate sets or updates the X-Forwarded-For header, adding
// the known remote address to an existing chain if present
func XForwardedForUpdate(next http.Handler) http.Handler {
func XForwardedForUpdate(stripPrivate bool, next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
defer next.ServeHTTP(w, r)
pref := XFFComputePreferences{
StripPrivate: true,
StripPrivate: stripPrivate,
StripLoopback: true,
StripCGNAT: true,
Flatten: true,

View File

@@ -13,6 +13,10 @@ func (c *OGTagCache) GetOGTags(url *url.URL, originalHost string) (map[string]st
return nil, errors.New("nil URL provided, cannot fetch OG tags")
}
if len(c.ogOverride) != 0 {
return c.ogOverride, nil
}
target := c.getTarget(url)
cacheKey := c.generateCacheKey(target, originalHost)

View File

@@ -7,10 +7,49 @@ import (
"reflect"
"testing"
"time"
"github.com/TecharoHQ/anubis/lib/policy/config"
)
func TestCacheReturnsDefault(t *testing.T) {
want := map[string]string{
"og:title": "Foo bar",
"og:description": "The best website ever made!!!1!",
}
cache := NewOGTagCache("", config.OpenGraph{
Enabled: true,
TimeToLive: time.Minute,
ConsiderHost: false,
Override: want,
})
u, err := url.Parse("https://anubis.techaro.lol")
if err != nil {
t.Fatal(err)
}
result, err := cache.GetOGTags(u, "anubis.techaro.lol")
if err != nil {
t.Fatal(err)
}
for k, v := range want {
t.Run(k, func(t *testing.T) {
if got := result[k]; got != v {
t.Logf("want: tags[%q] = %q", k, v)
t.Logf("got: tags[%q] = %q", k, got)
t.Error("invalid result from function")
}
})
}
}
func TestCheckCache(t *testing.T) {
cache := NewOGTagCache("http://example.com", true, time.Minute, false)
cache := NewOGTagCache("http://example.com", config.OpenGraph{
Enabled: true,
TimeToLive: time.Minute,
ConsiderHost: false,
})
// Set up test data
urlStr := "http://example.com/page"
@@ -69,7 +108,11 @@ func TestGetOGTags(t *testing.T) {
defer ts.Close()
// Create an instance of OGTagCache with a short TTL for testing
cache := NewOGTagCache(ts.URL, true, 1*time.Minute, false)
cache := NewOGTagCache(ts.URL, config.OpenGraph{
Enabled: true,
TimeToLive: time.Minute,
ConsiderHost: false,
})
// Parse the test server URL
parsedURL, err := url.Parse(ts.URL)
@@ -216,7 +259,11 @@ func TestGetOGTagsWithHostConsideration(t *testing.T) {
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
loadCount = 0 // Reset load count for each test case
cache := NewOGTagCache(ts.URL, true, 1*time.Minute, tc.ogCacheConsiderHost)
cache := NewOGTagCache(ts.URL, config.OpenGraph{
Enabled: true,
TimeToLive: time.Minute,
ConsiderHost: tc.ogCacheConsiderHost,
})
for i, req := range tc.requests {
ogTags, err := cache.GetOGTags(parsedURL, req.host)

View File

@@ -10,6 +10,7 @@ import (
"testing"
"time"
"github.com/TecharoHQ/anubis/lib/policy/config"
"golang.org/x/net/html"
)
@@ -80,7 +81,11 @@ func TestFetchHTMLDocument(t *testing.T) {
}))
defer ts.Close()
cache := NewOGTagCache("", true, time.Minute, false)
cache := NewOGTagCache("", config.OpenGraph{
Enabled: true,
TimeToLive: time.Minute,
ConsiderHost: false,
})
doc, err := cache.fetchHTMLDocument(ts.URL, "anything")
if tt.expectError {
@@ -107,7 +112,11 @@ func TestFetchHTMLDocumentInvalidURL(t *testing.T) {
t.Skip("test requires theoretical network egress")
}
cache := NewOGTagCache("", true, time.Minute, false)
cache := NewOGTagCache("", config.OpenGraph{
Enabled: true,
TimeToLive: time.Minute,
ConsiderHost: false,
})
doc, err := cache.fetchHTMLDocument("http://invalid.url.that.doesnt.exist.example", "anything")

View File

@@ -6,6 +6,8 @@ import (
"net/url"
"testing"
"time"
"github.com/TecharoHQ/anubis/lib/policy/config"
)
func TestIntegrationGetOGTags(t *testing.T) {
@@ -104,7 +106,11 @@ func TestIntegrationGetOGTags(t *testing.T) {
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
// Create cache instance
cache := NewOGTagCache(ts.URL, true, 1*time.Minute, false)
cache := NewOGTagCache(ts.URL, config.OpenGraph{
Enabled: true,
TimeToLive: time.Minute,
ConsiderHost: false,
})
// Create URL for test
testURL, _ := url.Parse(ts.URL)

150
internal/ogtags/mem_test.go Normal file
View File

@@ -0,0 +1,150 @@
package ogtags
import (
"net/url"
"runtime"
"strings"
"testing"
"github.com/TecharoHQ/anubis/lib/policy/config"
"golang.org/x/net/html"
)
func BenchmarkGetTarget(b *testing.B) {
tests := []struct {
name string
target string
paths []string
}{
{
name: "HTTP",
target: "http://example.com",
paths: []string{"/", "/path", "/path/to/resource", "/path?query=1&foo=bar"},
},
{
name: "Unix",
target: "unix:///var/run/app.sock",
paths: []string{"/", "/api/endpoint", "/api/endpoint?param=value"},
},
}
for _, tt := range tests {
b.Run(tt.name, func(b *testing.B) {
cache := NewOGTagCache(tt.target, config.OpenGraph{})
urls := make([]*url.URL, len(tt.paths))
for i, path := range tt.paths {
u, _ := url.Parse(path)
urls[i] = u
}
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_ = cache.getTarget(urls[i%len(urls)])
}
})
}
}
func BenchmarkExtractOGTags(b *testing.B) {
htmlSamples := []string{
`<html><head>
<meta property="og:title" content="Test Title">
<meta property="og:description" content="Test Description">
<meta name="keywords" content="test,keywords">
</head><body></body></html>`,
`<html><head>
<meta property="og:title" content="Page Title">
<meta property="og:type" content="website">
<meta property="og:url" content="https://example.com">
<meta property="og:image" content="https://example.com/image.jpg">
<meta property="twitter:card" content="summary_large_image">
<meta property="twitter:title" content="Twitter Title">
<meta name="description" content="Page description">
<meta name="author" content="John Doe">
</head><body><div><p>Content</p></div></body></html>`,
}
cache := NewOGTagCache("http://example.com", config.OpenGraph{})
docs := make([]*html.Node, len(htmlSamples))
for i, sample := range htmlSamples {
doc, _ := html.Parse(strings.NewReader(sample))
docs[i] = doc
}
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_ = cache.extractOGTags(docs[i%len(docs)])
}
}
// Memory usage test
func TestMemoryUsage(t *testing.T) {
cache := NewOGTagCache("http://example.com", config.OpenGraph{})
// Force GC and wait for it to complete
runtime.GC()
var m1 runtime.MemStats
runtime.ReadMemStats(&m1)
// Run getTarget many times
u, _ := url.Parse("/path/to/resource?query=1&foo=bar&baz=qux")
for i := 0; i < 10000; i++ {
_ = cache.getTarget(u)
}
// Force GC after operations
runtime.GC()
var m2 runtime.MemStats
runtime.ReadMemStats(&m2)
allocatedBytes := int64(m2.TotalAlloc) - int64(m1.TotalAlloc)
allocatedKB := float64(allocatedBytes) / 1024.0
allocatedPerOp := float64(allocatedBytes) / 10000.0
t.Logf("Memory allocated for 10k getTarget calls:")
t.Logf(" Total: %.2f KB (%.2f MB)", allocatedKB, allocatedKB/1024.0)
t.Logf(" Per operation: %.2f bytes", allocatedPerOp)
// Test extractOGTags memory usage
htmlDoc := `<html><head>
<meta property="og:title" content="Test Title">
<meta property="og:description" content="Test Description">
<meta property="og:image" content="https://example.com/image.jpg">
<meta property="twitter:card" content="summary">
<meta name="keywords" content="test,keywords,example">
<meta name="author" content="Test Author">
<meta property="unknown:tag" content="Should be ignored">
</head><body></body></html>`
doc, _ := html.Parse(strings.NewReader(htmlDoc))
runtime.GC()
runtime.ReadMemStats(&m1)
for i := 0; i < 1000; i++ {
_ = cache.extractOGTags(doc)
}
runtime.GC()
runtime.ReadMemStats(&m2)
allocatedBytes = int64(m2.TotalAlloc) - int64(m1.TotalAlloc)
allocatedKB = float64(allocatedBytes) / 1024.0
allocatedPerOp = float64(allocatedBytes) / 1000.0
t.Logf("Memory allocated for 1k extractOGTags calls:")
t.Logf(" Total: %.2f KB (%.2f MB)", allocatedKB, allocatedKB/1024.0)
t.Logf(" Per operation: %.2f bytes", allocatedPerOp)
// Sanity checks
if allocatedPerOp > 10000 {
t.Errorf("extractOGTags allocating too much memory per operation: %.2f bytes", allocatedPerOp)
}
}

View File

@@ -10,27 +10,34 @@ import (
"time"
"github.com/TecharoHQ/anubis/decaymap"
"github.com/TecharoHQ/anubis/lib/policy/config"
)
const (
maxContentLength = 16 << 20 // 16 MiB in bytes, if there is a reasonable reason that you need more than this...Why?
maxContentLength = 8 << 20 // 8 MiB is enough for anyone
httpTimeout = 5 * time.Second /*todo: make this configurable?*/
schemeSeparatorLength = 3 // Length of "://"
querySeparatorLength = 1 // Length of "?" for query strings
)
type OGTagCache struct {
cache *decaymap.Impl[string, map[string]string]
targetURL *url.URL
client *http.Client
cache *decaymap.Impl[string, map[string]string]
targetURL *url.URL
client *http.Client
// Pre-built strings for optimization
unixPrefix string // "http://unix"
approvedTags []string
approvedPrefixes []string
ogTimeToLive time.Duration
ogCacheConsiderHost bool
ogPassthrough bool
ogOverride map[string]string
}
func NewOGTagCache(target string, ogPassthrough bool, ogTimeToLive time.Duration, ogTagsConsiderHost bool) *OGTagCache {
func NewOGTagCache(target string, conf config.OpenGraph) *OGTagCache {
// Predefined approved tags and prefixes
// In the future, these could come from configuration
defaultApprovedTags := []string{"description", "keywords", "author"}
defaultApprovedPrefixes := []string{"og:", "twitter:", "fediverse:"}
@@ -71,37 +78,51 @@ func NewOGTagCache(target string, ogPassthrough bool, ogTimeToLive time.Duration
return &OGTagCache{
cache: decaymap.New[string, map[string]string](),
targetURL: parsedTargetURL, // Store the parsed URL
ogPassthrough: ogPassthrough,
ogTimeToLive: ogTimeToLive,
ogCacheConsiderHost: ogTagsConsiderHost, // todo: refactor to be a separate struct
targetURL: parsedTargetURL,
ogPassthrough: conf.Enabled,
ogTimeToLive: conf.TimeToLive,
ogCacheConsiderHost: conf.ConsiderHost,
ogOverride: conf.Override,
approvedTags: defaultApprovedTags,
approvedPrefixes: defaultApprovedPrefixes,
client: client,
unixPrefix: "http://unix",
}
}
// getTarget constructs the target URL string for fetching OG tags.
// For Unix sockets, it creates a "fake" HTTP URL that the custom dialer understands.
// Optimized to minimize allocations by building strings directly.
func (c *OGTagCache) getTarget(u *url.URL) string {
var escapedPath = u.EscapedPath() // will cause an allocation if path contains special characters
if c.targetURL.Scheme == "unix" {
// The custom dialer ignores the host, but we need a valid http URL structure.
// Use "unix" as a placeholder host. Path and Query from original request are appended.
fakeURL := &url.URL{
Scheme: "http", // Scheme must be http/https for client.Get
Host: "unix", // Arbitrary host, ignored by custom dialer
Path: u.Path,
RawQuery: u.RawQuery,
// Build URL string directly without creating intermediate URL object
var sb strings.Builder
sb.Grow(len(c.unixPrefix) + len(escapedPath) + len(u.RawQuery) + querySeparatorLength) // Pre-allocate
sb.WriteString(c.unixPrefix)
sb.WriteString(escapedPath)
if u.RawQuery != "" {
sb.WriteByte('?')
sb.WriteString(u.RawQuery)
}
return fakeURL.String()
return sb.String()
}
// For regular http/https targets
target := *c.targetURL // Make a copy
target.Path = u.Path
target.RawQuery = u.RawQuery
return target.String()
// For regular http/https targets, build URL string directly
var sb strings.Builder
// Pre-calculate size: scheme + "://" + host + path + "?" + query
estimatedSize := len(c.targetURL.Scheme) + schemeSeparatorLength + len(c.targetURL.Host) + len(escapedPath) + len(u.RawQuery) + querySeparatorLength
sb.Grow(estimatedSize)
sb.WriteString(c.targetURL.Scheme)
sb.WriteString("://")
sb.WriteString(c.targetURL.Host)
sb.WriteString(escapedPath)
if u.RawQuery != "" {
sb.WriteByte('?')
sb.WriteString(u.RawQuery)
}
return sb.String()
}
func (c *OGTagCache) Cleanup() {

View File

@@ -0,0 +1,310 @@
package ogtags
import (
"net/url"
"strings"
"testing"
"unicode/utf8"
"github.com/TecharoHQ/anubis/lib/policy/config"
"golang.org/x/net/html"
)
// FuzzGetTarget tests getTarget with various inputs
func FuzzGetTarget(f *testing.F) {
// Seed corpus with interesting test cases
testCases := []struct {
target string
path string
query string
}{
{"http://example.com", "/", ""},
{"http://example.com", "/path", "q=1"},
{"unix:///tmp/socket", "/api", "key=value"},
{"https://example.com:8080", "/path/to/resource", "a=1&b=2"},
{"http://example.com", "/path with spaces", "q=hello world"},
{"http://example.com", "/path/❤️/emoji", "emoji=🎉"},
{"http://example.com", "/path/../../../etc/passwd", ""},
{"http://example.com", "/path%2F%2E%2E%2F", "q=%3Cscript%3E"},
{"unix:///var/run/app.sock", "/../../etc/passwd", ""},
{"http://[::1]:8080", "/ipv6", "test=1"},
{"http://example.com", strings.Repeat("/very/long/path", 100), strings.Repeat("param=value&", 100)},
{"http://example.com", "/path%20with%20encoded", "q=%20encoded%20"},
{"http://example.com", "/пример/кириллица", "q=тест"},
{"http://example.com", "/中文/路径", "查询=值"},
{"", "/path", "q=1"}, // Empty target
}
for _, tc := range testCases {
f.Add(tc.target, tc.path, tc.query)
}
f.Fuzz(func(t *testing.T, target, path, query string) {
// Skip invalid UTF-8 to focus on realistic inputs
if !utf8.ValidString(target) || !utf8.ValidString(path) || !utf8.ValidString(query) {
t.Skip()
}
// Create cache - should not panic
cache := NewOGTagCache(target, config.OpenGraph{})
// Create URL
u := &url.URL{
Path: path,
RawQuery: query,
}
// Call getTarget - should not panic
result := cache.getTarget(u)
// Basic validation
if result == "" {
t.Errorf("getTarget returned empty string for target=%q, path=%q, query=%q", target, path, query)
}
// Verify result is a valid URL (for non-empty targets)
if target != "" {
parsedResult, err := url.Parse(result)
if err != nil {
t.Errorf("getTarget produced invalid URL %q: %v", result, err)
} else {
// For unix sockets, verify the scheme is http
if strings.HasPrefix(target, "unix:") && parsedResult.Scheme != "http" {
t.Errorf("Unix socket URL should have http scheme, got %q", parsedResult.Scheme)
}
}
}
// Ensure no memory corruption by calling multiple times
for i := 0; i < 3; i++ {
result2 := cache.getTarget(u)
if result != result2 {
t.Errorf("getTarget not deterministic: %q != %q", result, result2)
}
}
})
}
// FuzzExtractOGTags tests extractOGTags with various HTML inputs
func FuzzExtractOGTags(f *testing.F) {
// Seed corpus with interesting HTML cases
htmlCases := []string{
`<html><head><meta property="og:title" content="Test"></head></html>`,
`<meta property="og:title" content="No HTML tags">`,
`<html><head>` + strings.Repeat(`<meta property="og:title" content="Many tags">`, 1000) + `</head></html>`,
`<html><head><meta property="og:title" content="<script>alert('xss')</script>"></head></html>`,
`<html><head><meta property="og:title" content="Line1&#10;Line2"></head></html>`,
`<html><head><meta property="og:emoji" content="❤️🎉🎊"></head></html>`,
`<html><head><meta property="og:title" content="` + strings.Repeat("A", 10000) + `"></head></html>`,
`<html><head><meta property="og:title" content='Single quotes'></head></html>`,
`<html><head><meta property=og:title content=no-quotes></head></html>`,
`<html><head><meta name="keywords" content="test,keywords"></head></html>`,
`<html><head><meta property="unknown:tag" content="Should be ignored"></head></html>`,
`<html><head><meta property="` + strings.Repeat("og:", 100) + `title" content="Nested prefixes"></head></html>`,
`<html>` + strings.Repeat(`<div>`, 1000) + `<meta property="og:title" content="Deep nesting">` + strings.Repeat(`</div>`, 1000) + `</html>`,
`<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml"><head><meta property="og:title" content="With doctype"/></head></html>`,
`<html><head><meta property="" content="Empty property"></head></html>`,
`<html><head><meta content="Content only"></head></html>`,
`<html><head><meta property="og:title"></head></html>`, // No content
``, // Empty HTML
`<html><head><meta property="og:title" content="Кириллица"></head></html>`,
`<html><head><meta property="og:title" content="中文内容"></head></html>`,
`<html><head><!--<meta property="og:title" content="Commented out">--></head></html>`,
`<html><head><META PROPERTY="OG:TITLE" CONTENT="UPPERCASE"></head></html>`,
}
for _, htmlc := range htmlCases {
f.Add(htmlc)
}
f.Fuzz(func(t *testing.T, htmlContent string) {
// Skip invalid UTF-8
if !utf8.ValidString(htmlContent) {
t.Skip()
}
// Parse HTML - may fail on invalid input
doc, err := html.Parse(strings.NewReader(htmlContent))
if err != nil {
// This is expected for malformed HTML
return
}
cache := NewOGTagCache("http://example.com", config.OpenGraph{})
// Should not panic
tags := cache.extractOGTags(doc)
// Validate results
for property, content := range tags {
// Ensure property is approved
approved := false
for _, prefix := range cache.approvedPrefixes {
if strings.HasPrefix(property, prefix) {
approved = true
break
}
}
if !approved {
for _, tag := range cache.approvedTags {
if property == tag {
approved = true
break
}
}
}
if !approved {
t.Errorf("Unapproved property %q was extracted", property)
}
// Ensure content is valid string
if !utf8.ValidString(content) {
t.Errorf("Invalid UTF-8 in content for property %q", property)
}
}
// Test determinism
tags2 := cache.extractOGTags(doc)
if len(tags) != len(tags2) {
t.Errorf("extractOGTags not deterministic: different lengths %d != %d", len(tags), len(tags2))
}
for k, v := range tags {
if tags2[k] != v {
t.Errorf("extractOGTags not deterministic: %q=%q != %q=%q", k, v, k, tags2[k])
}
}
})
}
// FuzzGetTargetRoundTrip tests that getTarget produces valid URLs that can be parsed back
func FuzzGetTargetRoundTrip(f *testing.F) {
f.Add("http://example.com", "/path/to/resource", "key=value&foo=bar")
f.Add("unix:///tmp/socket", "/api/endpoint", "param=test")
f.Fuzz(func(t *testing.T, target, path, query string) {
if !utf8.ValidString(target) || !utf8.ValidString(path) || !utf8.ValidString(query) {
t.Skip()
}
cache := NewOGTagCache(target, config.OpenGraph{})
u := &url.URL{Path: path, RawQuery: query}
result := cache.getTarget(u)
if result == "" {
return
}
// Parse the result back
parsed, err := url.Parse(result)
if err != nil {
t.Errorf("getTarget produced unparseable URL: %v", err)
return
}
// For non-unix targets, verify path preservation (accounting for encoding)
if !strings.HasPrefix(target, "unix:") && target != "" {
// The paths should match after normalization
expectedPath := u.EscapedPath()
if parsed.EscapedPath() != expectedPath {
t.Errorf("Path not preserved: want %q, got %q", expectedPath, parsed.EscapedPath())
}
// Query should be preserved exactly
if parsed.RawQuery != query {
t.Errorf("Query not preserved: want %q, got %q", query, parsed.RawQuery)
}
}
})
}
// FuzzExtractMetaTagInfo tests the extractMetaTagInfo function directly
func FuzzExtractMetaTagInfo(f *testing.F) {
// Seed with various attribute combinations
f.Add("og:title", "Test Title", "property")
f.Add("keywords", "test,keywords", "name")
f.Add("og:description", "A description with \"quotes\"", "property")
f.Add("twitter:card", "summary", "property")
f.Add("unknown:tag", "Should be filtered", "property")
f.Add("", "Content without property", "property")
f.Add("og:title", "", "property") // Property without content
f.Fuzz(func(t *testing.T, propertyValue, contentValue, propertyKey string) {
if !utf8.ValidString(propertyValue) || !utf8.ValidString(contentValue) || !utf8.ValidString(propertyKey) {
t.Skip()
}
// Create a meta node
node := &html.Node{
Type: html.ElementNode,
Data: "meta",
Attr: []html.Attribute{
{Key: propertyKey, Val: propertyValue},
{Key: "content", Val: contentValue},
},
}
cache := NewOGTagCache("http://example.com", config.OpenGraph{})
// Should not panic
property, content := cache.extractMetaTagInfo(node)
// If property is returned, it must be approved
if property != "" {
approved := false
for _, prefix := range cache.approvedPrefixes {
if strings.HasPrefix(property, prefix) {
approved = true
break
}
}
if !approved {
for _, tag := range cache.approvedTags {
if property == tag {
approved = true
break
}
}
}
if !approved {
t.Errorf("extractMetaTagInfo returned unapproved property: %q", property)
}
}
// Content should match input if property is approved
if property != "" && content != contentValue {
t.Errorf("Content mismatch: want %q, got %q", contentValue, content)
}
})
}
// Benchmark comparison for the fuzzed scenarios
func BenchmarkFuzzedGetTarget(b *testing.B) {
// Test with various challenging inputs found during fuzzing
inputs := []struct {
name string
target string
path string
query string
}{
{"Simple", "http://example.com", "/api", "k=v"},
{"LongPath", "http://example.com", strings.Repeat("/segment", 50), ""},
{"LongQuery", "http://example.com", "/", strings.Repeat("param=value&", 50)},
{"Unicode", "http://example.com", "/путь/路径/path", "q=значение"},
{"Encoded", "http://example.com", "/path%20with%20spaces", "q=%3Cscript%3E"},
{"Unix", "unix:///tmp/socket.sock", "/api/v1/resource", "id=123&format=json"},
}
for _, input := range inputs {
b.Run(input.name, func(b *testing.B) {
cache := NewOGTagCache(input.target, config.OpenGraph{})
u := &url.URL{Path: input.path, RawQuery: input.query}
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_ = cache.getTarget(u)
}
})
}
}

View File

@@ -13,6 +13,8 @@ import (
"strings"
"testing"
"time"
"github.com/TecharoHQ/anubis/lib/policy/config"
)
func TestNewOGTagCache(t *testing.T) {
@@ -38,7 +40,11 @@ func TestNewOGTagCache(t *testing.T) {
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
cache := NewOGTagCache(tt.target, tt.ogPassthrough, tt.ogTimeToLive, false)
cache := NewOGTagCache(tt.target, config.OpenGraph{
Enabled: tt.ogPassthrough,
TimeToLive: tt.ogTimeToLive,
ConsiderHost: false,
})
if cache == nil {
t.Fatal("expected non-nil cache, got nil")
@@ -74,7 +80,11 @@ func TestNewOGTagCache_UnixSocket(t *testing.T) {
socketPath := filepath.Join(tempDir, "test.sock")
target := "unix://" + socketPath
cache := NewOGTagCache(target, true, 5*time.Minute, false)
cache := NewOGTagCache(target, config.OpenGraph{
Enabled: true,
TimeToLive: 5 * time.Minute,
ConsiderHost: false,
})
if cache == nil {
t.Fatal("expected non-nil cache, got nil")
@@ -155,7 +165,11 @@ func TestGetTarget(t *testing.T) {
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
cache := NewOGTagCache(tt.target, false, time.Minute, false)
cache := NewOGTagCache(tt.target, config.OpenGraph{
Enabled: true,
TimeToLive: time.Minute,
ConsiderHost: false,
})
u := &url.URL{
Path: tt.path,
@@ -175,7 +189,9 @@ func TestGetTarget(t *testing.T) {
func TestIntegrationGetOGTags_UnixSocket(t *testing.T) {
tempDir := t.TempDir()
socketPath := filepath.Join(tempDir, "anubis-test.sock")
// XXX(Xe): if this is named longer, macOS fails with `bind: invalid argument`
// because the unix socket path is too long. I love computers.
socketPath := filepath.Join(tempDir, "t")
// Ensure the socket does not exist initially
_ = os.Remove(socketPath)
@@ -222,7 +238,11 @@ func TestIntegrationGetOGTags_UnixSocket(t *testing.T) {
// Create cache instance pointing to the Unix socket
targetURL := "unix://" + socketPath
cache := NewOGTagCache(targetURL, true, 1*time.Minute, false)
cache := NewOGTagCache(targetURL, config.OpenGraph{
Enabled: true,
TimeToLive: time.Minute,
ConsiderHost: false,
})
// Create a dummy URL for the request (path and query matter)
testReqURL, _ := url.Parse("/some/page?query=1")

View File

@@ -12,15 +12,12 @@ func (c *OGTagCache) extractOGTags(doc *html.Node) map[string]string {
var traverseNodes func(*html.Node)
traverseNodes = func(n *html.Node) {
// isOGMetaTag only checks if it's a <meta> tag.
// The actual filtering happens in extractMetaTagInfo now.
if isOGMetaTag(n) {
property, content := c.extractMetaTagInfo(n)
if property != "" {
ogTags[property] = content
}
}
for child := n.FirstChild; child != nil; child = child.NextSibling {
traverseNodes(child)
}
@@ -39,43 +36,40 @@ func isOGMetaTag(n *html.Node) bool {
}
// extractMetaTagInfo extracts property and content from a meta tag
// *and* checks if the property is approved.
// Returns empty property string if the tag is not approved.
func (c *OGTagCache) extractMetaTagInfo(n *html.Node) (property, content string) {
var rawProperty string // Store the property found before approval check
var propertyKey string
// Single pass through attributes, using range to avoid bounds checking
for _, attr := range n.Attr {
if attr.Key == "property" || attr.Key == "name" {
rawProperty = attr.Val
}
if attr.Key == "content" {
switch attr.Key {
case "property", "name":
propertyKey = attr.Val
case "content":
content = attr.Val
}
}
// Check if the rawProperty is approved
isApproved := false
for _, prefix := range c.approvedPrefixes {
if strings.HasPrefix(rawProperty, prefix) {
isApproved = true
// Early exit if we have both
if propertyKey != "" && content != "" {
break
}
}
// Check exact approved tags if not already approved by prefix
if !isApproved {
for _, tag := range c.approvedTags {
if rawProperty == tag {
isApproved = true
break
}
if propertyKey == "" {
return "", content
}
// Check prefixes first (more common case)
for _, prefix := range c.approvedPrefixes {
if strings.HasPrefix(propertyKey, prefix) {
return propertyKey, content
}
}
// Only return the property if it's approved
if isApproved {
property = rawProperty
// Check exact matches
for _, tag := range c.approvedTags {
if propertyKey == tag {
return propertyKey, content
}
}
// Content is returned regardless, but property will be "" if not approved
return property, content
return "", content
}

View File

@@ -6,13 +6,18 @@ import (
"testing"
"time"
"github.com/TecharoHQ/anubis/lib/policy/config"
"golang.org/x/net/html"
)
// TestExtractOGTags updated with correct expectations based on filtering logic
func TestExtractOGTags(t *testing.T) {
// Use a cache instance that reflects the default approved lists
testCache := NewOGTagCache("", false, time.Minute, false)
testCache := NewOGTagCache("", config.OpenGraph{
Enabled: false,
ConsiderHost: false,
TimeToLive: time.Minute,
})
// Manually set approved tags/prefixes based on the user request for clarity
testCache.approvedTags = []string{"description"}
testCache.approvedPrefixes = []string{"og:"}
@@ -189,7 +194,11 @@ func TestIsOGMetaTag(t *testing.T) {
func TestExtractMetaTagInfo(t *testing.T) {
// Use a cache instance that reflects the default approved lists
testCache := NewOGTagCache("", false, time.Minute, false)
testCache := NewOGTagCache("", config.OpenGraph{
Enabled: false,
ConsiderHost: false,
TimeToLive: time.Minute,
})
testCache.approvedTags = []string{"description"}
testCache.approvedPrefixes = []string{"og:"}

View File

@@ -595,7 +595,7 @@ func spawnAnubisWithOptions(t *testing.T, basePrefix string) string {
fmt.Fprintf(w, "<html><body><span id=anubis-test>%d</span></body></html>", time.Now().Unix())
})
policy, err := libanubis.LoadPoliciesOrDefault("", anubis.DefaultDifficulty)
policy, err := libanubis.LoadPoliciesOrDefault(t.Context(), "", anubis.DefaultDifficulty)
if err != nil {
t.Fatal(err)
}

View File

@@ -0,0 +1,69 @@
package thoth
import (
"context"
"errors"
"fmt"
"log/slog"
"net/http"
"strings"
"time"
"github.com/TecharoHQ/anubis/internal"
"github.com/TecharoHQ/anubis/lib/policy/checker"
iptoasnv1 "github.com/TecharoHQ/thoth-proto/gen/techaro/thoth/iptoasn/v1"
)
func (c *Client) ASNCheckerFor(asns []uint32) checker.Impl {
asnMap := map[uint32]struct{}{}
var sb strings.Builder
fmt.Fprintln(&sb, "ASNChecker")
for _, asn := range asns {
asnMap[asn] = struct{}{}
fmt.Fprintln(&sb, "AS", asn)
}
return &ASNChecker{
iptoasn: c.IPToASN,
asns: asnMap,
hash: internal.FastHash(sb.String()),
}
}
type ASNChecker struct {
iptoasn iptoasnv1.IpToASNServiceClient
asns map[uint32]struct{}
hash string
}
func (asnc *ASNChecker) Check(r *http.Request) (bool, error) {
ctx, cancel := context.WithTimeout(r.Context(), 500*time.Millisecond)
defer cancel()
ipInfo, err := asnc.iptoasn.Lookup(ctx, &iptoasnv1.LookupRequest{
IpAddress: r.Header.Get("X-Real-Ip"),
})
if err != nil {
switch {
case errors.Is(err, context.DeadlineExceeded):
slog.Debug("error contacting thoth", "err", err, "actionable", false)
return false, nil
default:
slog.Error("error contacting thoth, please contact support", "err", err, "actionable", true)
return false, nil
}
}
// If IP is not publicly announced, return false
if !ipInfo.GetAnnounced() {
return false, nil
}
_, ok := asnc.asns[uint32(ipInfo.GetAsNumber())]
return ok, nil
}
func (asnc *ASNChecker) Hash() string {
return asnc.hash
}

View File

@@ -0,0 +1,81 @@
package thoth_test
import (
"fmt"
"net/http/httptest"
"testing"
"github.com/TecharoHQ/anubis/internal/thoth"
"github.com/TecharoHQ/anubis/lib/policy/checker"
iptoasnv1 "github.com/TecharoHQ/thoth-proto/gen/techaro/thoth/iptoasn/v1"
)
var _ checker.Impl = &thoth.ASNChecker{}
func TestASNChecker(t *testing.T) {
cli := loadSecrets(t)
asnc := cli.ASNCheckerFor([]uint32{13335})
for _, cs := range []struct {
ipAddress string
wantMatch bool
wantError bool
}{
{
ipAddress: "1.1.1.1",
wantMatch: true,
wantError: false,
},
{
ipAddress: "2.2.2.2",
wantMatch: false,
wantError: false,
},
{
ipAddress: "taco",
wantMatch: false,
wantError: false,
},
{
ipAddress: "127.0.0.1",
wantMatch: false,
wantError: false,
},
} {
t.Run(fmt.Sprintf("%v", cs), func(t *testing.T) {
req := httptest.NewRequest("GET", "/", nil)
req.Header.Set("X-Real-Ip", cs.ipAddress)
match, err := asnc.Check(req)
if match != cs.wantMatch {
t.Errorf("Wanted match: %v, got: %v", cs.wantMatch, match)
}
switch {
case err != nil && !cs.wantError:
t.Errorf("Did not want error but got: %v", err)
case err == nil && cs.wantError:
t.Error("Wanted error but got none")
}
})
}
}
func BenchmarkWithCache(b *testing.B) {
cli := loadSecrets(b)
req := &iptoasnv1.LookupRequest{IpAddress: "1.1.1.1"}
_, err := cli.IPToASN.Lookup(b.Context(), req)
if err != nil {
b.Error(err)
}
for b.Loop() {
_, err := cli.IPToASN.Lookup(b.Context(), req)
if err != nil {
b.Error(err)
}
}
}

39
internal/thoth/auth.go Normal file
View File

@@ -0,0 +1,39 @@
package thoth
import (
"context"
"google.golang.org/grpc"
"google.golang.org/grpc/metadata"
)
func authUnaryClientInterceptor(token string) grpc.UnaryClientInterceptor {
return func(
ctx context.Context,
method string,
req interface{},
reply interface{},
cc *grpc.ClientConn,
invoker grpc.UnaryInvoker,
opts ...grpc.CallOption,
) error {
md := metadata.Pairs("authorization", "Bearer "+token)
ctx = metadata.NewOutgoingContext(ctx, md)
return invoker(ctx, method, req, reply, cc, opts...)
}
}
func authStreamClientInterceptor(token string) grpc.StreamClientInterceptor {
return func(
ctx context.Context,
desc *grpc.StreamDesc,
cc *grpc.ClientConn,
method string,
streamer grpc.Streamer,
opts ...grpc.CallOption,
) (grpc.ClientStream, error) {
md := metadata.Pairs("authorization", "Bearer "+token)
ctx = metadata.NewOutgoingContext(ctx, md)
return streamer(ctx, desc, cc, method, opts...)
}
}

View File

@@ -0,0 +1,84 @@
package thoth
import (
"context"
"errors"
"fmt"
"log/slog"
"net/netip"
iptoasnv1 "github.com/TecharoHQ/thoth-proto/gen/techaro/thoth/iptoasn/v1"
"github.com/gaissmai/bart"
"google.golang.org/grpc"
)
type IPToASNWithCache struct {
next iptoasnv1.IpToASNServiceClient
table *bart.Table[*iptoasnv1.LookupResponse]
}
func NewIpToASNWithCache(next iptoasnv1.IpToASNServiceClient) *IPToASNWithCache {
result := &IPToASNWithCache{
next: next,
table: &bart.Table[*iptoasnv1.LookupResponse]{},
}
for _, pfx := range []netip.Prefix{
netip.MustParsePrefix("10.0.0.0/8"), // RFC 1918
netip.MustParsePrefix("172.16.0.0/12"), // RFC 1918
netip.MustParsePrefix("192.168.0.0/16"), // RFC 1918
netip.MustParsePrefix("127.0.0.0/8"), // Loopback
netip.MustParsePrefix("169.254.0.0/16"), // Link-local
netip.MustParsePrefix("100.64.0.0/10"), // CGNAT
netip.MustParsePrefix("192.0.0.0/24"), // Protocol assignments
netip.MustParsePrefix("192.0.2.0/24"), // TEST-NET-1
netip.MustParsePrefix("198.18.0.0/15"), // Benchmarking
netip.MustParsePrefix("198.51.100.0/24"), // TEST-NET-2
netip.MustParsePrefix("203.0.113.0/24"), // TEST-NET-3
netip.MustParsePrefix("240.0.0.0/4"), // Reserved
netip.MustParsePrefix("255.255.255.255/32"), // Broadcast
netip.MustParsePrefix("fc00::/7"), // Unique local address
netip.MustParsePrefix("fe80::/10"), // Link-local
netip.MustParsePrefix("::1/128"), // Loopback
netip.MustParsePrefix("::/128"), // Unspecified
netip.MustParsePrefix("100::/64"), // Discard-only
netip.MustParsePrefix("2001:db8::/32"), // Documentation
} {
result.table.Insert(pfx, &iptoasnv1.LookupResponse{Announced: false})
}
return result
}
func (ip2asn *IPToASNWithCache) Lookup(ctx context.Context, lr *iptoasnv1.LookupRequest, opts ...grpc.CallOption) (*iptoasnv1.LookupResponse, error) {
addr, err := netip.ParseAddr(lr.GetIpAddress())
if err != nil {
return nil, fmt.Errorf("input is not an IP address: %w", err)
}
cachedResponse, ok := ip2asn.table.Lookup(addr)
if ok {
return cachedResponse, nil
}
resp, err := ip2asn.next.Lookup(ctx, lr, opts...)
if err != nil {
return nil, err
}
var errs []error
for _, cidr := range resp.GetCidr() {
pfx, err := netip.ParsePrefix(cidr)
if err != nil {
errs = append(errs, err)
continue
}
ip2asn.table.Insert(pfx, resp)
}
if len(errs) != 0 {
slog.Error("errors parsing IP prefixes", "err", errors.Join(errs...))
}
return resp, nil
}

14
internal/thoth/context.go Normal file
View File

@@ -0,0 +1,14 @@
package thoth
import "context"
type ctxKey struct{}
func With(ctx context.Context, cli *Client) context.Context {
return context.WithValue(ctx, ctxKey{}, cli)
}
func FromContext(ctx context.Context) (*Client, bool) {
cli, ok := ctx.Value(ctxKey{}).(*Client)
return cli, ok
}

View File

@@ -0,0 +1,68 @@
package thoth
import (
"context"
"errors"
"fmt"
"log/slog"
"net/http"
"strings"
"time"
"github.com/TecharoHQ/anubis/lib/policy/checker"
iptoasnv1 "github.com/TecharoHQ/thoth-proto/gen/techaro/thoth/iptoasn/v1"
)
func (c *Client) GeoIPCheckerFor(countries []string) checker.Impl {
countryMap := map[string]struct{}{}
var sb strings.Builder
fmt.Fprintln(&sb, "GeoIPChecker")
for _, cc := range countries {
countryMap[cc] = struct{}{}
fmt.Fprintln(&sb, cc)
}
return &GeoIPChecker{
IPToASN: c.IPToASN,
Countries: countryMap,
hash: sb.String(),
}
}
type GeoIPChecker struct {
IPToASN iptoasnv1.IpToASNServiceClient
Countries map[string]struct{}
hash string
}
func (gipc *GeoIPChecker) Check(r *http.Request) (bool, error) {
ctx, cancel := context.WithTimeout(r.Context(), 500*time.Millisecond)
defer cancel()
ipInfo, err := gipc.IPToASN.Lookup(ctx, &iptoasnv1.LookupRequest{
IpAddress: r.Header.Get("X-Real-Ip"),
})
if err != nil {
switch {
case errors.Is(err, context.DeadlineExceeded):
slog.Debug("error contacting thoth", "err", err, "actionable", false)
return false, nil
default:
slog.Error("error contacting thoth, please contact support", "err", err, "actionable", true)
return false, nil
}
}
// If IP is not publicly announced, return false
if !ipInfo.GetAnnounced() {
return false, nil
}
_, ok := gipc.Countries[strings.ToLower(ipInfo.GetCountryCode())]
return ok, nil
}
func (gipc *GeoIPChecker) Hash() string {
return gipc.hash
}

View File

@@ -0,0 +1,63 @@
package thoth_test
import (
"fmt"
"net/http/httptest"
"testing"
"github.com/TecharoHQ/anubis/internal/thoth"
"github.com/TecharoHQ/anubis/lib/policy/checker"
)
var _ checker.Impl = &thoth.GeoIPChecker{}
func TestGeoIPChecker(t *testing.T) {
cli := loadSecrets(t)
asnc := cli.GeoIPCheckerFor([]string{"us"})
for _, cs := range []struct {
ipAddress string
wantMatch bool
wantError bool
}{
{
ipAddress: "1.1.1.1",
wantMatch: true,
wantError: false,
},
{
ipAddress: "2.2.2.2",
wantMatch: false,
wantError: false,
},
{
ipAddress: "taco",
wantMatch: false,
wantError: false,
},
{
ipAddress: "127.0.0.1",
wantMatch: false,
wantError: false,
},
} {
t.Run(fmt.Sprintf("%v", cs), func(t *testing.T) {
req := httptest.NewRequest("GET", "/", nil)
req.Header.Set("X-Real-Ip", cs.ipAddress)
match, err := asnc.Check(req)
if match != cs.wantMatch {
t.Errorf("Wanted match: %v, got: %v", cs.wantMatch, match)
}
switch {
case err != nil && !cs.wantError:
t.Errorf("Did not want error but got: %v", err)
case err == nil && cs.wantError:
t.Error("Wanted error but got none")
}
})
}
}

88
internal/thoth/thoth.go Normal file
View File

@@ -0,0 +1,88 @@
package thoth
import (
"context"
"crypto/tls"
"fmt"
"time"
"github.com/TecharoHQ/anubis"
iptoasnv1 "github.com/TecharoHQ/thoth-proto/gen/techaro/thoth/iptoasn/v1"
grpcprom "github.com/grpc-ecosystem/go-grpc-middleware/providers/prometheus"
"github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/timeout"
"github.com/prometheus/client_golang/prometheus"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials"
"google.golang.org/grpc/credentials/insecure"
healthv1 "google.golang.org/grpc/health/grpc_health_v1"
)
type Client struct {
conn *grpc.ClientConn
health healthv1.HealthClient
IPToASN iptoasnv1.IpToASNServiceClient
}
func New(ctx context.Context, thothURL, apiToken string, plaintext bool) (*Client, error) {
clMetrics := grpcprom.NewClientMetrics(
grpcprom.WithClientHandlingTimeHistogram(
grpcprom.WithHistogramBuckets([]float64{0.001, 0.01, 0.1, 0.3, 0.6, 1, 3, 6, 9, 20, 30, 60, 90, 120}),
),
)
prometheus.DefaultRegisterer.Register(clMetrics)
do := []grpc.DialOption{
grpc.WithChainUnaryInterceptor(
timeout.UnaryClientInterceptor(500*time.Millisecond),
clMetrics.UnaryClientInterceptor(),
authUnaryClientInterceptor(apiToken),
),
grpc.WithChainStreamInterceptor(
clMetrics.StreamClientInterceptor(),
authStreamClientInterceptor(apiToken),
),
grpc.WithUserAgent(fmt.Sprint("Techaro/anubis:", anubis.Version)),
}
if plaintext {
do = append(do, grpc.WithTransportCredentials(insecure.NewCredentials()))
} else {
do = append(do, grpc.WithTransportCredentials(credentials.NewTLS(&tls.Config{})))
}
conn, err := grpc.NewClient(
thothURL,
do...,
)
if err != nil {
return nil, fmt.Errorf("can't dial thoth at %s: %w", thothURL, err)
}
hc := healthv1.NewHealthClient(conn)
resp, err := hc.Check(ctx, &healthv1.HealthCheckRequest{})
if err != nil {
return nil, fmt.Errorf("can't verify thoth health at %s: %w", thothURL, err)
}
if resp.Status != healthv1.HealthCheckResponse_SERVING {
return nil, fmt.Errorf("thoth is not healthy, wanted %s but got %s", healthv1.HealthCheckResponse_SERVING, resp.Status)
}
return &Client{
conn: conn,
health: hc,
IPToASN: NewIpToASNWithCache(iptoasnv1.NewIpToASNServiceClient(conn)),
}, nil
}
func (c *Client) Close() error {
if c.conn != nil {
return c.conn.Close()
}
return nil
}
func (c *Client) WithIPToASNService(impl iptoasnv1.IpToASNServiceClient) {
c.IPToASN = impl
}

View File

@@ -0,0 +1,36 @@
package thoth_test
import (
"os"
"testing"
"github.com/TecharoHQ/anubis/internal/thoth"
"github.com/TecharoHQ/anubis/internal/thoth/thothmock"
"github.com/joho/godotenv"
)
func loadSecrets(t testing.TB) *thoth.Client {
t.Helper()
if err := godotenv.Load(); err != nil {
t.Log("using mock thoth")
result := &thoth.Client{}
result.WithIPToASNService(thothmock.MockIpToASNService())
return result
}
cli, err := thoth.New(t.Context(), os.Getenv("THOTH_URL"), os.Getenv("THOTH_API_KEY"), false)
if err != nil {
t.Fatal(err)
}
return cli
}
func TestNew(t *testing.T) {
cli := loadSecrets(t)
if err := cli.Close(); err != nil {
t.Fatal(err)
}
}

View File

@@ -0,0 +1,59 @@
package thothmock
import (
"context"
"net/netip"
iptoasnv1 "github.com/TecharoHQ/thoth-proto/gen/techaro/thoth/iptoasn/v1"
"google.golang.org/grpc"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)
func MockIpToASNService() *IpToASNService {
responses := map[string]*iptoasnv1.LookupResponse{
"127.0.0.1": {Announced: false},
"::1": {Announced: false},
"10.10.10.10": {
Announced: true,
AsNumber: 13335,
Cidr: []string{"1.1.1.0/24"},
CountryCode: "US",
Description: "Cloudflare",
},
"2.2.2.2": {
Announced: true,
AsNumber: 420,
Cidr: []string{"2.2.2.0/24"},
CountryCode: "CA",
Description: "test canada",
},
"1.1.1.1": {
Announced: true,
AsNumber: 13335,
Cidr: []string{"1.1.1.0/24"},
CountryCode: "US",
Description: "Cloudflare",
},
}
return &IpToASNService{Responses: responses}
}
type IpToASNService struct {
iptoasnv1.UnimplementedIpToASNServiceServer
Responses map[string]*iptoasnv1.LookupResponse
}
func (ip2asn *IpToASNService) Lookup(ctx context.Context, lr *iptoasnv1.LookupRequest, opts ...grpc.CallOption) (*iptoasnv1.LookupResponse, error) {
if _, err := netip.ParseAddr(lr.GetIpAddress()); err != nil {
return nil, err
}
resp, ok := ip2asn.Responses[lr.GetIpAddress()]
if !ok {
return nil, status.Error(codes.NotFound, "IP address not found in mock")
}
return resp, nil
}

View File

@@ -0,0 +1,17 @@
package thothmock
import (
"context"
"testing"
"github.com/TecharoHQ/anubis/internal/thoth"
)
func WithMockThoth(t *testing.T) context.Context {
t.Helper()
thothCli := &thoth.Client{}
thothCli.WithIPToASNService(MockIpToASNService())
ctx := thoth.With(t.Context(), thothCli)
return ctx
}

Some files were not shown because too many files have changed in this diff Show More