Compare commits

...

3 Commits

Author SHA1 Message Date
Xe Iaso
da8ebae94e chore(default-config): allowlist common crawl
This may seem strange, but allowlisting common crawl means that scrapers
have less incentive to scrape because they can just grab the data from
common crawl instead of scraping it again.
2025-07-04 00:03:54 +00:00
Xe Iaso
d7a758f805 docs: add BotStopper docs from the git repo (#752)
Signed-off-by: Xe Iaso <me@xeiaso.net>
2025-07-03 23:09:45 +00:00
Martin
c121896f9c feat(localization): Add German language translation (#741)
* Add german translation

* Adjust german localization

* Adjust js_finished_reading in german localization

* Mention this change in CHANGELOG.md

* Add test for German localization

* Update lib/localization/locales/de.json

Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
Signed-off-by: Martin <31348196+Earl0fPudding@users.noreply.github.com>

* Remove duplicate "leider" in lib/localization/locales/de.json

Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
Signed-off-by: Martin <31348196+Earl0fPudding@users.noreply.github.com>

* Update lib/localization/locales/de.json

Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
Signed-off-by: Martin <31348196+Earl0fPudding@users.noreply.github.com>

* Update lib/localization/locales/de.json

Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
Signed-off-by: Martin <31348196+Earl0fPudding@users.noreply.github.com>

---------

Signed-off-by: Martin <31348196+Earl0fPudding@users.noreply.github.com>
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-07-03 10:48:17 +00:00
14 changed files with 314 additions and 4 deletions

View File

@@ -20,11 +20,13 @@ bdba
berr
bingbot
bitcoin
bitrate
blogging
Bluesky
blueskybot
boi
botnet
botstopper
BPort
Brightbot
broked
@@ -52,6 +54,7 @@ ckie
cloudflare
Codespaces
confd
connnection
containerbuild
coreutils
Cotoyogi
@@ -93,6 +96,7 @@ facebookgo
Factset
fastcgi
fediverse
ffprobe
finfos
Firecrawl
flagenv
@@ -199,6 +203,7 @@ omgilibot
openai
opengraph
openrc
oswald
pag
palemoon
Pangu
@@ -272,6 +277,8 @@ SVCNAME
tagline
tarballs
tarrif
tbn
tbr
techaro
techarohq
templ

View File

@@ -7,5 +7,5 @@
# Warning: May contain user agents that _must_ be blocked in robots.txt, or the opt-out will have no effect.
- name: "ai-catchall"
user_agent_regex: >-
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|anthropic-ai|Brightbot 1.0|Bytespider|CCBot|Claude-Web|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-CloudVertexBot|GoogleOther|GoogleOther-Image|GoogleOther-Video|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|NovaAct|omgili|omgilibot|Operator|PanguBot|Perplexity-User|PerplexityBot|PetalBot|QualifiedBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|wpbot|YouBot
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|anthropic-ai|Brightbot 1.0|Bytespider|Claude-Web|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-CloudVertexBot|GoogleOther|GoogleOther-Image|GoogleOther-Video|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|NovaAct|omgili|omgilibot|Operator|PanguBot|Perplexity-User|PerplexityBot|PetalBot|QualifiedBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|wpbot|YouBot
action: DENY

View File

@@ -1,6 +1,8 @@
# Warning: Contains user agents that _must_ be blocked in robots.txt, or the opt-out will have no effect.
# Note: Blocks human-directed/non-training user agents
#
# CCBot is allowed because if Common Crawl is allowed, then scrapers don't need to scrape to get the data.
- name: "ai-robots-txt"
user_agent_regex: >-
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|Andibot|anthropic-ai|Applebot|Applebot-Extended|bedrockbot|Brightbot 1.0|Bytespider|CCBot|ChatGPT-User|Claude-SearchBot|Claude-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|EchoboxBot|FacebookBot|facebookexternalhit|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-CloudVertexBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|MistralAI-User/1.0|MyCentralAIScraperBot|NovaAct|OAI-SearchBot|omgili|omgilibot|Operator|PanguBot|Panscient|panscient.com|Perplexity-User|PerplexityBot|PetalBot|PhindBot|Poseidon Research Crawler|QualifiedBot|QuillBot|quillbot.com|SBIntuitionsBot|Scrapy|SemrushBot|SemrushBot-BA|SemrushBot-CT|SemrushBot-OCOB|SemrushBot-SI|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|wpbot|YandexAdditional|YandexAdditionalBot|YouBot
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|Andibot|anthropic-ai|Applebot|Applebot-Extended|bedrockbot|Brightbot 1.0|Bytespider|ChatGPT-User|Claude-SearchBot|Claude-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|EchoboxBot|FacebookBot|facebookexternalhit|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-CloudVertexBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|MistralAI-User/1.0|MyCentralAIScraperBot|NovaAct|OAI-SearchBot|omgili|omgilibot|Operator|PanguBot|Panscient|panscient.com|Perplexity-User|PerplexityBot|PetalBot|PhindBot|Poseidon Research Crawler|QualifiedBot|QuillBot|quillbot.com|SBIntuitionsBot|Scrapy|SemrushBot|SemrushBot-BA|SemrushBot-CT|SemrushBot-OCOB|SemrushBot-SI|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|wpbot|YandexAdditional|YandexAdditionalBot|YouBot
action: DENY

View File

@@ -6,4 +6,5 @@
- import: (data)/crawlers/internet-archive.yaml
- import: (data)/crawlers/kagibot.yaml
- import: (data)/crawlers/marginalia.yaml
- import: (data)/crawlers/mojeekbot.yaml
- import: (data)/crawlers/mojeekbot.yaml
- import: (data)/crawlers/commoncrawl.yaml

View File

@@ -0,0 +1,12 @@
- name: common-crawl
user_agent_regex: CCBot
action: ALLOW
# https://index.commoncrawl.org/ccbot.json
remote_addresses:
[
"2600:1f28:365:80b0::/60",
"18.97.9.168/29",
"18.97.14.80/29",
"18.97.14.88/30",
"98.85.178.216/32",
]

View File

@@ -20,8 +20,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Implement a [development container](https://containers.dev/) manifest to make contributions easier.
- Fix dynamic cookie domains functionality ([#731](https://github.com/TecharoHQ/anubis/pull/731))
- Add option for custom cookie prefix ([#732](https://github.com/TecharoHQ/anubis/pull/732))
- Add translation for German language ([#741](https://github.com/TecharoHQ/anubis/pull/741))
- Remove the "Success" interstitial after a proof of work challenge is concluded.
- Add option for forcing a specific language ([#742](https://github.com/TecharoHQ/anubis/pull/742))
- Allow [Common Crawl](https://commoncrawl.org/) by default so scrapers have less incentive to scrape
### Potentially breaking changes

View File

@@ -0,0 +1,215 @@
---
title: "Commercial support and an unbranded version"
---
If you want to use Anubis but organizational policies prevent you from using the branding that the open source project ships, we offer a commercial version of Anubis named BotStopper. BotStopper builds off of the open source core of Anubis and offers organizations more control over the branding, including but not limited to:
- Custom images for different states of the challenge process (in process, success, failure)
- Custom CSS and fonts
- Custom titles for the challenge and error pages
- "Anubis" replaced with "BotStopper" across the UI
- A private bug tracker for issues
In the near future this will expand to:
- A private challenge implementation that does advanced fingerprinting to check if the client is a genuine browser or not
- Advanced fingerprinting via [Thoth-based advanced checks](./thoth.mdx)
In order to sign up for BotStopper, please do one of the following:
- Sign up [on GitHub Sponsors](https://github.com/sponsors/Xe) at the $50 per month tier or higher
- Email [sales@techaro.lol](mailto:sales@techaro.lol) with your requirements for invoicing, please note that custom invoicing will cost more than using GitHub Sponsors for understandable overhead reasons
## Installation
Install BotStopper like you would Anubis, but replace the image reference. EG:
```diff
-ghcr.io/techarohq/anubis:latest
+ghcr.io/techarohq/botstopper/anubis:latest
```
### Binary packages
Binary packages are available [in the GitHub Releases page](https://github.com/TecharoHQ/botstopper/releases), the main difference is that the package name is `techaro-botstopper`, the systemd service is `techaro-botstopper@your-instance.service`, the binary is `/usr/bin/botstopper`, and the configuration is in `/etc/techaro-botstopper`. All other instructions in the [native package install guide](./native-install.mdx) apply.
### Docker / Podman
In order to pull the BotStopper image, you need to [authenticate with GitHub's Container Registry](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry).
```text
docker login ghcr.io -u your-username --password-stdin
```
Then you can use the image as normal.
### Kubernetes
If you are using Kubernetes, you will need to create an image pull secret:
```text
kubectl create secret docker-registry \
techarohq-botstopper \
--docker-server ghcr.io \
--docker-username your-username \
--docker-password your-access-token \
--docker-email your@email.address
```
Then attach it to your Deployment:
```diff
spec:
securityContext:
fsGroup: 1000
+ imagePullSecrets:
+ - name: techarohq-botstopper
```
## Configuration
### Docker compose
Follow [the upstream Docker compose directions](https://anubis.techaro.lol/docs/admin/environments/docker-compose) with the following additional options:
```diff
anubis:
image: ghcr.io/techarohq/botstopper/anubis:latest
environment:
BIND: ":8080"
DIFFICULTY: "4"
METRICS_BIND: ":9090"
SERVE_ROBOTS_TXT: "true"
TARGET: "http://nginx"
OG_PASSTHROUGH: "true"
OG_EXPIRY_TIME: "24h"
+ # botstopper config here
+ CHALLENGE_TITLE: "Doing math for your connnection!"
+ ERROR_TITLE: "Something went wrong!"
+ OVERLAY_FOLDER: /assets
+ volumes:
+ - "./your_folder:/assets"
```
#### Example
There is an example in [docker-compose.yaml](https://github.com/TecharoHQ/botstopper/blob/main/docker-compose.yaml). Start the example with `docker compose up`:
```text
docker compose up -d
```
And then open [https://botstopper.local.cetacean.club:8443](https://botstopper.local.cetacean.club:8443) in your browser.
> [!NOTE]
> This uses locally signed sacrificial TLS certificates stored in `./demo/pki`. Your browser will rightly reject these. Here is what the example looks like:
>
> ![](/img/botstopper/example-screenshot.webp)
## Custom images and CSS
Anubis uses an internal filesystem that contains CSS, JavaScript, and images. The BotStopper variant of Anubis lets you specify an overlay folder with the environment variable `OVERLAY_FOLDER`. The contents of this folder will be overlaid on top of Anubis' internal filesystem, allowing you to easily customize the images and CSS.
Your directory tree should look like this, assuming your data is in `./your_folder`:
```text
./your_folder
└── static
├── css
│ └── custom.css
└── img
├── happy.webp
├── pensive.webp
└── reject.webp
```
For an example directory tree using some off-the-shelf images the Tango icon set, see the [testdata](https://github.com/TecharoHQ/botstopper/tree/main/testdata/static/img) folder.
### Custom CSS
CSS customization is done mainly with CSS variables. View [the example custom CSS file](https://github.com/TecharoHQ/botstopper/blob/main/testdata/static/css/custom.css) for more information about what can be customized.
### Custom fonts
If you want to add custom fonts, copy the `woff2` files alongside your `custom.css` file and then include them with the [`@font-face` CSS at-rule](https://developer.mozilla.org/en-US/docs/Web/CSS/@font-face):
```css
@font-face {
font-family: "Oswald";
font-style: normal;
font-weight: 200 900;
font-display: swap;
src: url("./fonts/oswald.woff2") format("woff2");
}
```
Then adjust your CSS variables accordingly:
```css
:root {
--body-sans-font: Oswald, sans-serif;
--body-preformatted-font: monospace;
--body-title-font: serif;
}
```
To convert `.ttf` fonts to [Web-optimized woff2 fonts](https://www.w3.org/TR/WOFF2/), use the `woff2_compress` command from the `woff2` or `woff2-tools` package:
```console
$ woff2_compress oswald.ttf
Processing oswald.ttf => oswald.woff2
Compressed 159517 to 70469.
```
Then you can import and use it as normal.
### Customizing images
Anubis uses three images to visually communicate the state of the program. These are:
| Image name | Intended message | Example |
| :------------- | :----------------------------------------------- | :-------------------------------- |
| `happy.webp` | You have passed validation, all is good | ![](/img/botstopper/happy.webp) |
| `pensive.webp` | Checking is running, hold steady until it's done | ![](/img/botstopper/pensive.webp) |
| `reject.webp` | Something went wrong, this is a terminal state | ![](/img/botstopper/reject.webp) |
To make your own images at the optimal quality, use the following ffmpeg command:
```text
ffmpeg -i /path/to/image -vf scale=-1:384 happy.webp
```
`ffprobe` should report something like this on the generated images:
```text
Input #0, webp_pipe, from 'happy.webp':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: webp, none, 25 fps, 25 tbr, 25 tbn
```
In testing 384 by 384 pixels gives the best balance between filesize, quality, and clarity.
```text
$ du -hs *
4.0K happy.webp
12K pensive.webp
8.0K reject.webp
```
## Customizing messages
You can customize messages using the following environment variables:
| Message | Environment variable | Default |
| :------------------- | :------------------- | :----------------------------------------- |
| Challenge page title | `CHALLENGE_TITLE` | `Ensuring the security of your connection` |
| Error page title | `ERROR_TITLE` | `Error` |
For example:
```sh
# /etc/techaro-botstopper/gitea.env
CHALLENGE_TITLE="Wait a moment please!"
ERROR_TITLE="Client error"
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

BIN
docs/static/img/botstopper/happy.webp vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.5 KiB

BIN
docs/static/img/botstopper/pensive.webp vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.0 KiB

BIN
docs/static/img/botstopper/reject.webp vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.1 KiB

View File

@@ -0,0 +1,63 @@
{
"loading": "Ladevorgang...",
"why_am_i_seeing": "Warum sehe ich diese Seite?",
"protected_by": "Geschützt durch",
"made_with": "Mit ❤️ gemacht in 🇨🇦",
"mascot_design": "Maskottchen erstellt von",
"ai_companies_explanation": "Diese Seite wird angezeigt, da der Betreiber der Webseite Anubis eingerichtet hat, um sie vor aggressiven KI-Website-Scrapern zu schützen. Diese können Ausfälle der Webseite verursachen, wodurch die Webseite für jeden nicht erreichbar ist.",
"anubis_compromise": "Anubis ist eine Art Kompromiss. Es verwendet die sogenannte Proof-of-Work Methode nach Hashcash, ein Mechanismus, der ursprünglich zur E-Mail-Spam-Bekämpfung entwickelt wurde. Die Idee dahinter ist, dass ein einziger User nur eine kleine Verzögerung hat, auf die Webseite zu gelangen; bei Scrapern kann das allerdings große Auswirkungen haben.",
"hack_purpose": "Man könnte dies als eine Lösung bezeichnen, die gut genug ist, einem etwas Zeit zu verschaffen für Fingerprinting und dem Identifizieren von Headless Browsern, sodass im besten Fall normale User diese Seite garnicht erst zu sehen bekommen.",
"jshelter_note": "Anubis benötigt moderne JavaScript-Features, welche von Plugins wie zB JShelter deaktiviert werden. Bitte deaktiviere also JShelter oder ähnliche Plugins für diese Domain.",
"version_info": "Diese Webseite läuft mit Anubis version",
"try_again": "Nochmal probieren",
"go_home": "Zur Hauptseite",
"contact_webmaster": "oder wenn es sich hier um einen Fehler handelt, kontaktiere bitte den Administrator der Webseite unter",
"connection_security": "Bitte warte einen Moment während wir sicherstellen, dass eine sichere Verbindung verwendet wird.",
"javascript_required": "Es muss leider JavaScript aktiviert werden, um den Check durchführen zu können. Dies ist leider notwendig weil Firmen im KI-Sektor die sozialen Verhältnisse geändert haben, wie Website-Hosting funktioniert. Eine Lösung ohne JavaScript ist in Entwicklung.",
"benchmark_requires_js": "Das Benchmark-Tool benötigt das Aktivieren von JavaScript.",
"difficulty": "Schwierigkeit:",
"algorithm": "Algorithmus:",
"compare": "Vergleich:",
"time": "Zeit",
"iters": "Iterationen",
"time_a": "Zeit A",
"iters_a": "Iterationen A",
"time_b": "Zeit B",
"iters_b": "Iterationen B",
"static_check_endpoint": "Dies ist nur ein Check-Endpunkt, der von beispielsweise einem Reverse-Proxy geprüft werden kann.",
"authorization_required": "Zugriffserlaubnis benötigt",
"cookies_disabled": "Cookies sind in Ihrem Browser deaktiviert. Anubis benötigt Cookies um sicherzustellen, dass es sich hierbei um einen validen Zugriff handelt. Bitte aktiviere Cookies für diese Domain.",
"access_denied": "Zugriff verweigert: Fehlercode",
"dronebl_entry": "Eintrag in DroneBL",
"see_dronebl_lookup": "anzeigen",
"internal_server_error": "Interner Server Error: Misskonfiguration von Anubis. Bitte kontatkiere den Administrator damit dieser die Logs prüfen kann.",
"invalid_redirect": "Ungültige Weiterleitung",
"redirect_not_parseable": "URL der Weiterleitung kann nicht verarbeitet werden",
"redirect_domain_not_allowed": "Domain der Weiterleitung nicht erlaubt",
"failed_to_sign_jwt": "Signierung des JWT fehlgeschlagen",
"invalid_invocation": "Aufrufen von MakeChallenge ungültig",
"client_error_browser": "Client Error: Bitte stelle sicher, dass der Browser aktuell ist und probiere es später erneut.",
"oh_noes": "Vermaledeit!",
"benchmarking_anubis": "Benchmark wird durchgeführt!",
"you_are_not_a_bot": "Sie sind kein Bot!",
"making_sure_not_bot": "Ihr Browser wird geprüft!",
"celphase": "CELPHASE",
"js_web_crypto_error": "Ihr Browser hat leider kein funktionierendes web.crypto Element. Wird eine sichere Verbindung verwendet?",
"js_web_workers_error": "Ihr Browser unterstützt keine Web-Worker (Anubis verwendet diese, damit der Browser nicht unresponsive wird). Ist eventuell ein Plugin wie zB JShelter installiert?",
"js_cookies_error": "Ihr Browser speichert keine Cookies. Anubis verwendet Cookies um ein gültiges Token zu speichern damit es wissen kann, welche Browser bereits geprüft wurden. Bitte aktiviere Cookies für diese Domain. Die Cookie-Namen von Anubis könnten sich jederzeit ändern. Cookie-Namen sind kein Teil der öffentlichen API.",
"js_context_not_secure": "Diese Verbindung ist nicht sicher!",
"js_context_not_secure_msg": "Bitte probiere, dich via HTTPS zu verbinden und lass den Webseiten-Administrator wissen, sauber HTTPS einzurichten. Mehr Informationen unter: <a href=\"https://developer.mozilla.org/en-US/docs/Web/Security/Secure_Contexts#when_is_a_context_considered_secure\">MDN</a>.",
"js_calculating": "Berechnung wird durchgeführt...",
"js_missing_feature": "Fehlendes Feature",
"js_challenge_error": "Fehler während des Checks!",
"js_challenge_error_msg": "Der Check-Algorithmus konnte nicht geladen werden. Bitte lade diese Seite erneut.",
"js_calculating_difficulty": "Berechnung wird durchgeführt...<br/>Schwierigkeit:",
"js_speed": "Geschwindigkeit:",
"js_verification_longer": "Der Check benötigt länger als erwartet. Bitte bleibe auf der Seite.",
"js_success": "Erfolgreich!",
"js_done_took": "Fertig! Dauer:",
"js_iterations": "Iterationen",
"js_finished_reading": "Fertig gelesen, weiter zur Seite →",
"js_calculation_error": "Fehler bei der Berechnung!",
"js_calculation_error_msg": "Fehler bei der Berechnung des Checks:"
}

View File

@@ -1,3 +1,3 @@
{
"supportedLanguages": ["en", "fr", "es", "pt-BR"]
"supportedLanguages": ["en", "fr", "es", "pt-BR", "de"]
}

View File

@@ -27,6 +27,14 @@ func TestLocalizationService(t *testing.T) {
}
})
t.Run("German localization", func(t *testing.T) {
localizer := service.GetLocalizer("de")
result := localizer.MustLocalize(&i18n.LocalizeConfig{MessageID: "loading"})
if result != "Ladevorgang..." {
t.Errorf("Expected 'Ladevorgang...', got '%s'", result)
}
})
t.Run("All required keys exist in English", func(t *testing.T) {
localizer := service.GetLocalizer("en")
requiredKeys := []string{