Commit Graph

54 Commits

Author SHA1 Message Date
Xe Iaso b794fe00f3 Merge branch 'main' into Xe/proof-of-react
Signed-off-by: Xe Iaso <xe.iaso@techaro.lol>
2025-08-29 15:32:52 -04:00
Xe Iaso e7755a352e feat: add 'proof of React' challenge
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-08-25 23:55:29 +00:00
Xe Iaso b0fa256e3e fix(default-config): also block alibaba cloud (#1005)
* fix(default-config): also block alibaba cloud

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-08-20 23:01:49 +00:00
Xe Iaso ee55d857eb fix(default-config): block Huawei Cloud (#1004)
* fix(default-config): block Huawei Cloud

Closes #978

Huawei Cloud has been egregious about its scraping. All attempts to
contact their abuse team have failed. If you work for Huawei Cloud,
please raise this issue internally and get the scraping to just stop.

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-08-20 22:40:07 +00:00
Dryusdan 237a6a98e2 Bump ai.robots.txt to v1.39 (#982) 2025-08-18 06:52:23 -04:00
Elliot Speck 87651f9506 default pattern fixes (#963)
* feat(checker): allow png/gif/jpg/jpeg/svg favicons as well as ico

* changelog: add updates to keep-internet-working.yaml

* fix(checker): tighten default regex patterns for well-known files

* changelog: add updates to regular expression patterns in keep-internet-working.yaml

---------

Signed-off-by: Elliot Speck <11192354+arcayr@users.noreply.github.com>
2025-08-09 07:40:33 -04:00
Elliot Speck 100005ce70 feat(checker): allow png/gif/jpg/jpeg/svg favicons as well as ico (#961)
* feat(checker): allow png/gif/jpg/jpeg/svg favicons as well as ico

* changelog: add updates to keep-internet-working.yaml

---------

Signed-off-by: Xe Iaso <xe.iaso@techaro.lol>
Co-authored-by: Xe Iaso <xe.iaso@techaro.lol>
2025-08-08 16:53:23 +00:00
Xe Iaso 7c80c23e90 docs: remove JSON examples from policy file docs (#945)
* docs: remove JSON examples from policy file docs

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): remove mentions of botPolicies.json in the tests

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: update link to challenge methods

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: unbreak links to the challenges category

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-08-03 18:09:26 +00:00
Xe Iaso bca2e87e80 feat(default-rules): add weight to Custom-AsyncHttpClient (#914)
Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Xe Iaso <xe.iaso@techaro.lol>
2025-07-27 00:41:43 +00:00
Marcel Bischoff 5c4d8480e6 Add services folder, add Uptime Robot policy definition (#838)
Uptime Robot is a commonly used service for tracking service
interruptions. Additional policy definitions may be beneficial for
services that do publish their IP addresses in use. The list is
additionally aggregated to slightly shorten it.

Signed-off-by: Marcel Bischoff <marcel@herrbischoff.com>
2025-07-16 09:17:48 -04:00
Xe Iaso 735b2ceb14 fix(default-config): disable system load check by default (#827)
This was causing issues with git clone against highly loaded servers. I
thought that this would be pretty innocuous, but I guess I was wrong.
Oops!

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-07-14 13:06:56 +00:00
Xe Iaso 4ea0add50d feat(lib/policy/expressions): add system load average to bot expression inputs (#766)
* feat(lib/policy/expressions): add system load average to bot expression inputs

This lets Anubis dynamically react to system load in order to
increase and decrease the required level of scrutiny. High load? More
scrutiny required. Low load? Less scrutiny required.

* docs: spell system correctly

Signed-off-by: Xe Iaso <me@xeiaso.net>

* Update metadata

check-spelling run (pull_request) for Xe/load-average

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>

* fix(default-config): don't enable low load average feature by default

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Signed-off-by: Xe Iaso <xe.iaso@techaro.lol>
2025-07-06 20:13:50 +00:00
Xe Iaso dff2176beb feat(lib): use new challenge creation flow (#749)
* feat(decaymap): add Delete method

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(lib/challenge): refactor Validate to take ValidateInput

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(lib): implement store interface

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(lib/store): all metapackage to import all store implementations

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(policy): import all store backends

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(lib): use new challenge creation flow

Previously Anubis constructed challenge strings from request metadata.
This was a good idea in spirit, but has turned out to be a very bad idea
in practice. This new flow reuses the Store facility to dynamically
create challenge values with completely random data.

This is a fairly big rewrite of how Anubis processes challenges. Right
now it defaults to using the in-memory storage backend, but on-disk
(boltdb) and valkey-based adaptors will come soon.

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(decaymap): fix documentation typo

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(lib): fix SA4004

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(lib/store): make generic storage interface test adaptor

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(decaymap): invert locking process for Delete

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(lib/store): add bbolt store implementation

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: go mod tidy

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(devcontainer): adapt to docker compose, add valkey service

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): make challenges live for 30 minutes by default

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(lib/store): implement valkey backend

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(lib/store/valkey): disable tests if not using docker

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(lib/policy/config): ensure valkey stores can be loaded

Signed-off-by: Xe Iaso <me@xeiaso.net>

* Update metadata

check-spelling run (pull_request) for Xe/store-interface

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>

* chore(devcontainer): remove port forwards because vs code handles that for you

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(default-config): add a nudge to the storage backends section of the docs

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(docs): listen on 0.0.0.0 for dev container support

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(policy): document storage backends

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: update CHANGELOG and internal links

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(admin/policies): don't start a sentence with as

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: fixes found in review

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
2025-07-04 20:42:28 +00:00
Xe Iaso 7c0996448a chore(default-config): allowlist common crawl (#753)
This may seem strange, but allowlisting common crawl means that scrapers
have less incentive to scrape because they can just grab the data from
common crawl instead of scraping it again.
2025-07-04 00:10:45 +00:00
Martin 6aa17532da fix: Dynamic cookie domain not working (#731)
* Fix cookieDynamicDomain option not being set in Options struct

* Fix using wrong cookie name when using dynamic cookie domains

* Adjust testcases for new cookie option structs

* Add known words to expect.txt and change typo in Zombocom

* Cleanup expect.txt

* Add changes to changelog

* Bump versions of grpc and apimachinery

* Fix testcases and add additional condition for dynamic cookie domain
2025-06-29 15:38:55 -04:00
Xe Iaso 4c74934e9f fix(default-config): Techaro -> Zombocom
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-22 20:04:40 -04:00
Xe Iaso 5870f7072c feat: implement imprint/impressum support (#706)
* feat: implement imprint/impressum support

Closes #362

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(docs/anubis): enable an imprint

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: fix the end of the sentence, comment out a default impressum

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: link back to impressum page

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-22 18:09:37 -04:00
Xe Iaso 3c1d95d61e fix(default-config): off-by-one error in the default thresholds (#701)
I don't know how I missed this in testing.
2025-06-20 11:47:34 -04:00
Xe Iaso 4948036f39 feat: add default OpenGraph tags to configuration file (#694)
* feat(config): opengraph passthrough configuration

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(ogtags): use config.OpenGraph for configuration

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: wire up ogtags config in most of the app

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(ogtags): return default tags if they are supplied

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: make OpenGraph legal so we have some sanity in reviewing

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): use OpenGraph.Enabled

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(lib): load default config file if one is not specified in spawnAnubis

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(config): fix ST1005

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: document open graph defaults and its new home in the policy file

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(installation): point to weight threshold new home

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: rename default to override

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(default-config): add off-by-default opengraph settings to bot policy file

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(anubis): make build

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(lib): fix build

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-19 18:00:44 -04:00
Xe Iaso 226cf36bf7 feat(config): custom weight thresholds via CEL (#688)
* feat(config): add Thresholds to the top level config file

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(config): make String() on ExpressionOrList join the component expressions

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(config): ensure unparseable json fails

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(config): if no thresholds are set, use the default thresholds

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(policy): half implement thresholds

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(policy): continue wiring things up

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(lib): wire up thresholds

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(lib): handle behavior from legacy configurations

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: document thresholds

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: update CHANGELOG, refer to threshold configuration

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): fix build

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(lib): fix U1000

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Co-authored-by: Jason Cameron <git@jasoncameron.dev>
2025-06-18 16:58:31 -04:00
Dryusdan 1d5fa49eb0 Bump ai.robots.txt to v1.37 (#689)
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
2025-06-18 13:30:53 -04:00
hydrargyrum 244f1c505a fix(geo): correct typo "counties" to "countries" (#678) 2025-06-17 23:50:42 -04:00
Xe Iaso e3826df3ab feat: implement a client for Thoth, the IP reputation database for Anubis (#637)
* feat(internal): add Thoth client and simple ASN checker

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(thoth): cached ip to asn checker

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: go mod tidy

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(thoth): minor testing fixups, ensure ASNChecker is Checker

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(thoth): make ASNChecker instances

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(thoth): add GeoIP checker

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(thoth): store a thoth client in a context

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: refactor Checker type to its own package

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(thoth): add thoth mocking package, ignore context deadline exceeded errors

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(thoth): pre-cache private ranges

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(lib/policy/config): enable thoth ASNs and GeoIP checker parsing

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(thoth): refactor to move checker creation to the checker files

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(policy): enable thoth checks

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(thothmock): test helper function for loading a mock thoth instance

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat: wire up Thoth, make thoth checks part of the default config

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(thoth): mend staticcheck errors

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(admin): add Thoth docs

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(policy): update Thoth links in error messages

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: update CHANGELOG

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(docs/manifest): enable Thoth

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: add THOTH_INSECURE for contacting Thoth over plain TCP in extreme circumstances

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(thoth): use mock thoth when credentials aren't detected in the environment

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(cmd/anubis): better warnings for half-configured Thoth setups

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(botpolicies): link to Thoth geoip docs

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-16 11:57:32 -04:00
Xe Iaso c638653172 feat(lib): implement request weight (#621)
* feat(lib): implement request weight

Replaces #608

This is a big one and will be what makes Anubis a generic web
application firewall. This introduces the WEIGH option, allowing
administrators to have facets of request metadata add or remove
"weight", or the level of suspicion. This really makes Anubis weigh
the soul of requests.

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): maintain legacy challenge behavior

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): make weight have dedicated checkers for the hashes

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(data): convert some rules over to weight points

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: document request weight

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(CHANGELOG): spelling error

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: fix links to challenge information

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(policies): fix formatting

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(config): make default weight adjustment 5

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-09 15:25:04 -04:00
Dryusdan 281b6c5c00 Bump ai.robots.txt to v1.34 (#632) 2025-06-08 14:54:47 -04:00
Xe Iaso 51c384eefd fix(data/bots): bring back ai-robots-txt.yaml
Closes #599

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-06-01 17:15:00 -04:00
Corry Haines de7dbfe6d6 Split up AI filtering files (#592)
* Split up AI filtering files

Create aggressive/moderate/permissive policies to allow administrators to choose their AI/LLM stance.

Aggressive policy matches existing default in Anubis.

Removes `Google-Extended` flag from `ai-robots-txt.yaml` as it doesn't exist in requests.

Rename `ai-robots-txt.yaml` to `ai-catchall.yaml` as the file is no longer a copy of the source repo/file.

* chore: spelling

* chore: fix embeds

* chore: fix data includes

* chore: fix file name typo

* chore: Ignore READMEs in configs

* chore(lib/policy/config): go tool goimports -w

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-06-01 20:21:18 +00:00
Corry Haines 0d9ebebff6 Opt-in policies for OpenAI and MistralAI bots (#590)
* Define OpenAI bot ALLOW policies

Allows OpenAI bots to be allowlisted at the choice of the Anubis administrator. None are enabled by default.

* Define MistralAI bot ALLOW policy

* chore: spelling
2025-05-31 16:48:57 -04:00
Corry Haines 68a71c6a99 Add Applebot definition (#589)
* Add Applebot definition

Adds Apple's search indexing bot, and allowlists it by default.

Allowlisted by default because it is equivalent to Googlebot/Bingbot. Remove Applebot from `ai-robots-txt.yaml` for the same reasons.

Remove `Applebot-Extended` from `ai-robots-txt.yaml` as it has no effect.

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-05-31 10:18:32 -04:00
Xe Iaso cd8a7eb2e2 feat(data): add x-firefox-ai default challenge rule (#580)
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-05-28 21:08:39 +00:00
Xe Iaso 22c47f40d1 feat(expressions): add randInt function to allow making rules nondeterministic (#578)
This seems counter-intuitive at first glance, but let me cook.

One of the problems with Anubis is that the rule matching is super
deterministic. This means that attackers can figure out what patterns
they are hitting and change things to bypass them.

The randInt function lets you have rulesets behave nondeterministically.
This is a very easy way to hang yourself, but can be great to
psychologically mess with scraper operators. Consider this rule:

```yaml
- name: deny-lightpanda-sometimes
  action: DENY
  expression:
    all:
      - userAgent.matches("LightPanda")
      - randInt(16) >= 4
```

It would match about 75% of the time.

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-05-28 16:36:27 -04:00
Dryusdan 11081aac08 Bump AI-robots.txt rules to version 1.31 (#538)
* Bump AI-robots.txt rules to version 1.31

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-05-23 16:15:12 +00:00
Dryusdan 9e9982ab5d feat(apps): Make SASL login work on bookstack with Anubis (#502)
* Make SASL login work on bookstack with Anubis

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-05-16 17:01:34 +00:00
Xe Iaso 3b98368aa9 feat(apps): add SearXNG instance tracker policy and Qualys Labs SSL testing rules (#512)
* feat(apps): add SearXNG instance tracker policy

* feat(apps): add Qualys SSL Labs policy

* chore: spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: hyperdefined <contact@hyper.lol>
2025-05-16 16:59:15 +00:00
Dryusdan 961320540b Bump AI-robots.txt rules to version 1.30 (#509)
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-05-16 16:40:25 +00:00
Xe Iaso f4298b993f fix(bots/phrik): add IPv6 address for phrik (#494)
Tracks https://gitlab.archlinux.org/archlinux/infrastructure/-/merge_requests/950
2025-05-11 14:04:44 -04:00
Josh Soref 52a6a65cc4 Spelling (#445)
* link: stackoverflow explanation of cookies

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>

* spelling: bazaar

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>

* spelling: enabling

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>

* spelling: expressions

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>

* spelling: implicitly

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>

* spelling: intermediate

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>

* spelling: nonexistent

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>

* spelling: open graph

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>

* spelling: really, really,

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>

* spelling: receive

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>

---------

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2025-05-05 10:52:02 -04:00
Xe Iaso 865d513e35 feat(checker): add CEL for matching complicated expressions (#421)
* feat(lib/policy): add support for CEL checkers

This adds the ability for administrators to use Common Expression
Language[0] (CEL) for more advanced check logic than Anubis previously
offered.

These can be as simple as:

```yaml
- name: allow-api-routes
  action: ALLOW
  expression:
    and:
    - '!(method == "HEAD" || method == "GET")'
    - path.startsWith("/api/")
```

or get as complicated as:

```yaml
- name: allow-git-clients
  action: ALLOW
  expression:
    and:
    - userAgent.startsWith("git/") || userAgent.contains("libgit") || userAgent.startsWith("go-git") || userAgent.startsWith("JGit/") || userAgent.startsWith("JGit-")
    - >
      "Git-Protocol" in headers && headers["Git-Protocol"] == "version=2"
```

Internally these are compiled and evaluated with cel-go[1]. This also
leaves room for extensibility should that be desired in the future. This
will intersect with #338 and eventually intersect with TLS fingerprints
as in #337.

[0]: https://cel.dev/
[1]: https://github.com/google/cel-go

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(data/apps): add API route allow rule for non-HEAD/GET

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: document expression syntax

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix: fixes in review

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-05-03 14:26:54 -04:00
Xe Iaso 6e82373718 feat(config): allow multi-level imports (#402)
* feat(config): allow multi-level imports

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(data): fix spelling of Marginalia

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-05-02 13:57:20 -04:00
Xe Iaso 74d330cec5 feat(config): add ability to customize HTTP status codes Anubis returns (#393)
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-04-29 15:13:44 -04:00
Dryusdan 76514f9f32 Bump AI-robots.txt rules to version 1.29 (#383) 2025-04-27 20:52:08 -04:00
Xe Iaso ef52550e70 fix(config): remove trailing newlines in regexes (#373)
Closes #372

Fun YAML fact of the day:

What is the difference between how these two expressions are parsed?

```yaml
foo: >
  bar
```

```yaml
foo: >-
  bar
```

They are invisible in yaml, but when you evaluate them to JSON the
difference is obvious:

```json
{
  "foo": "bar\n"
}
```

```json
{
  "foo": "bar"
}
```

User-Agent strings, URL path values, and HTTP headers _do_ end in
newlines in HTTP/1.1 wire form, but that newline is usually stripped
before the server actually handles it. Also HTTP/2 is a thing and does
not terminate header values with newlines.

This change makes Anubis more aggressively detect mistaken uses of the
yaml `>` operator and nudges the user into using the yaml `>-` operator
which does not append the trailing newline.

I had honestly forgotten about this YAML behavior because it wasn't
relevant for so long. Oops! Glad I released a beta.

Whenever you get into this state, Anubis will throw a config parsing
error and then give you a message hinting at the folly of your ways.

```
config.Bot: regular expression ends with newline (try >- instead of > in yaml)
```

Big thanks to https://yaml-multiline.info, this helped me realize my
folly instantly.

@aiverson, this is official permission to say "told you so".

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-04-26 14:01:15 +00:00
Igor Brai 54cd99c750 Fix: mojeekbot regex (#351)
* update mojeekbot UA regex

* add fix into changelog

* hack: empty commit to unbreak CI

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-04-24 14:24:41 +00:00
Xe Iaso 74e11505c6 feat: enable loading config fragments (#321)
* feat(config): support importing bot policy snippets

This changes the grammar of the Anubis bot policy config to allow
importing from internal shared rules or external rules on the
filesystem.

This lets you create a file at `/data/policies/block-evilbot.yaml` and
then import it with:

```yaml
bots:
- import: /data/policies/block-evilbot.yaml
```

This also explodes the default policy file into a bunch of composable
snippets.

Thank you @Aibrew for your example gitea Atom / RSS feed rules!

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(data): update botPolicies.json to use imports

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(cmd/anubis): extract bot policies with --extract-resources

This allows a user that doesn't have anything but the Anubis binary to
figure out what the default configuration does.

* docs(data/botPolices.yaml): document import syntax in-line

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib/policy): better test importing from JSON snippets

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(admin): Add import syntax documentation

This documents the import syntax and is based on the block comment at
the top of the default bot policy file.

* docs(changelog): add note about importing snippets

Signed-off-by: Xe Iaso <me@xeiaso.net>

* style(lib/policy/config): use an error value instead of an inline error

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-04-23 07:01:28 -04:00
Xe Iaso 3f1ce2d7ac data: disable generic-bot-catchall by default (#322)
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-04-22 08:11:45 -04:00
Xe Iaso d40b5cfdab lib: move config to yaml (#307)
* lib: move config to yaml

Signed-off-by: Xe Iaso <me@xeiaso.net>

* web: run go generate

Signed-off-by: Xe Iaso <me@xeiaso.net>

* Add Haiku to known instances (#304)

Signed-off-by: Asmodeus <46908100+AsmodeumX@users.noreply.github.com>

* Add headers bot rule (#300)

* Closes #291: add headers support to bot policy rules

* Fix config validator

* update docs for JSON -> YAML

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: document http header based actions

Signed-off-by: Xe Iaso <me@xeiaso.net>

* lib: add missing test

Signed-off-by: Xe Iaso <me@xeiaso.net>

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Asmodeus <46908100+AsmodeumX@users.noreply.github.com>
Co-authored-by: Asmodeus <46908100+AsmodeumX@users.noreply.github.com>
Co-authored-by: Neur0toxine <pashok9825@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-04-21 00:09:27 +00:00
Dryusdan a40c5e99fc Add more AI user agent in botPolicies.json (#249)
* Add more IA user agent in bot policies

* Update data/botPolicies.json

Signed-off-by: Xe Iaso <me@xeiaso.net>

* Fix trailling pipe that deny all requests

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-04-18 22:28:56 +00:00
Michael Jeanson af831f0d7f Add 'Opera' to 'generic-browser' bot policy rule (#220)
After deploying Anubis bot traffic is drastically reduced but I still
see a lot of requests from User-Agents that claim to be 'Opera' like so:

"Opera/9.90.(Windows NT 6.0; mt-MT) Presto/2.9.173 Version/10.00"
"Opera/8.46.(X11; Linux i686; fo-FO) Presto/2.9.161 Version/11.00"

Add 'Opera' to the generic-browser rule to also challenge them.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-04-18 17:57:02 +00:00
Remilia Da Costa Faro 095e18d0c8 Allow ranges from the Internet Archive (AS7941) (#276)
* Allow ranges from the Internet Archive (AS7941)

* Updated changelog

* Update data/botPolicies.json

Signed-off-by: Xe Iaso <me@xeiaso.net>

* Removed overlapping CIDR for internet-archive in botPolicies.json

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-04-18 04:13:01 +00:00
Maher 81307bcb5c feat: update botPolicies for DuckDuckGo web crawler (#250)
- updates botPolicies with ips from the website
- adds the updated information to the `CHANGELOG.md` file

Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-04-12 02:13:38 +00:00