mirror of
https://github.com/TecharoHQ/anubis.git
synced 2026-04-05 16:28:17 +00:00
Compare commits
154 Commits
v1.18.0
...
Xe/fix-anu
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
953f85ec74 | ||
|
|
94ed2cb1b7 | ||
|
|
0e43138324 | ||
|
|
c981c23f7e | ||
|
|
9f0c5e974e | ||
|
|
292c470ada | ||
|
|
12453fdc00 | ||
|
|
f5b3bf81bc | ||
|
|
1820649987 | ||
|
|
14eeeb56d6 | ||
|
|
d9e0fbe905 | ||
|
|
6aa17532da | ||
|
|
b1edf84a7c | ||
|
|
d47a3406db | ||
|
|
ff5991b5cf | ||
|
|
19f78f37ad | ||
|
|
b0b0a5c08a | ||
|
|
261306dc63 | ||
|
|
3520421757 | ||
|
|
ad5430612f | ||
|
|
c2423d0688 | ||
|
|
a1b7d2ccda | ||
|
|
7cf6ac5de6 | ||
|
|
59f5b07281 | ||
|
|
1562f88c35 | ||
|
|
15bd9b6a44 | ||
|
|
1ca531b930 | ||
|
|
f9259299b9 | ||
|
|
16a4e04027 | ||
|
|
8c79870edb | ||
|
|
060b10ea2d | ||
|
|
4c74934e9f | ||
|
|
5870f7072c | ||
|
|
3c1d95d61e | ||
|
|
ab801a3597 | ||
|
|
ecc716940e | ||
|
|
4948036f39 | ||
|
|
7aa732c700 | ||
|
|
226cf36bf7 | ||
|
|
1d5fa49eb0 | ||
|
|
97c1d4f353 | ||
|
|
244f1c505a | ||
|
|
ae4d3b0ce5 | ||
|
|
e60c43cdd2 | ||
|
|
b2b2679bae | ||
|
|
e2b46fc5e7 | ||
|
|
3437e575d4 | ||
|
|
ae064be710 | ||
|
|
e3826df3ab | ||
|
|
823d1be5d1 | ||
|
|
0c6a820372 | ||
|
|
81f6380dd4 | ||
|
|
e5455c02d8 | ||
|
|
1d8033d69e | ||
|
|
e0781e4560 | ||
|
|
7a195f1595 | ||
|
|
2904ff974b | ||
|
|
3b3080d497 | ||
|
|
60ba8e9557 | ||
|
|
14c80483a9 | ||
|
|
d1452b6d39 | ||
|
|
5e95da6b6c | ||
|
|
988fc0941b | ||
|
|
f5140ae57b | ||
|
|
bbdee34f37 | ||
|
|
6e2eeb9e65 | ||
|
|
c638653172 | ||
|
|
0fe46b48cf | ||
|
|
d6e5561768 | ||
|
|
6594ae0eef | ||
|
|
ad09f82c3c | ||
|
|
372b797f64 | ||
|
|
6eaf0e13a2 | ||
|
|
281b6c5c00 | ||
|
|
9539668049 | ||
|
|
8eff57fcb6 | ||
|
|
4ac59c3a79 | ||
|
|
bee1c22b96 | ||
|
|
5a7499ea3b | ||
|
|
5f3861ab37 | ||
|
|
9f1d791991 | ||
|
|
76fa3e01a5 | ||
|
|
f2db43ad4b | ||
|
|
ba4412c907 | ||
|
|
f184cd81e7 | ||
|
|
59bfced8bf | ||
|
|
780a935cb8 | ||
|
|
f4bc1df797 | ||
|
|
b496c90e86 | ||
|
|
ec73bcbaf1 | ||
|
|
8d19eed200 | ||
|
|
ec733e93a5 | ||
|
|
51c384eefd | ||
|
|
44d5ec0b6e | ||
|
|
3bc9040a96 | ||
|
|
de7dbfe6d6 | ||
|
|
77e0bbbce9 | ||
|
|
b4b5d2f82e | ||
|
|
988fff77f1 | ||
|
|
0d9ebebff6 | ||
|
|
ba00cdacd2 | ||
|
|
68a71c6a99 | ||
|
|
fbbab5a035 | ||
|
|
28ab29389c | ||
|
|
497005ce3e | ||
|
|
669eb4ba4b | ||
|
|
6c4e739b0b | ||
|
|
c8635357dc | ||
|
|
0ed905fd4e | ||
|
|
cd8a7eb2e2 | ||
|
|
22c47f40d1 | ||
|
|
669671bd46 | ||
|
|
6c247cdec8 | ||
|
|
eeae28f459 | ||
|
|
9ba10262e3 | ||
|
|
a28a3d155a | ||
|
|
086f43e3ca | ||
|
|
fa1f2355ea | ||
|
|
0a56194825 | ||
|
|
93e2447ba2 | ||
|
|
51f875ff6f | ||
|
|
555a188dc3 | ||
|
|
6f08bcb481 | ||
|
|
11081aac08 | ||
|
|
c78d830ecb | ||
|
|
5e7bfa5ec2 | ||
|
|
7b8953303d | ||
|
|
a6045d6698 | ||
|
|
e31e1ca5e7 | ||
|
|
50e030d17e | ||
|
|
b640c567da | ||
|
|
9e9982ab5d | ||
|
|
3b98368aa9 | ||
|
|
76849531cd | ||
|
|
961320540b | ||
|
|
91c21fbb4b | ||
|
|
caf69be97b | ||
|
|
6a12efee08 | ||
|
|
5e1abdd31c | ||
|
|
cb3bbbd4c8 | ||
|
|
d51b7ec0aa | ||
|
|
b164048dcf | ||
|
|
6c0ff3f4d5 | ||
|
|
9009596ded | ||
|
|
f4298b993f | ||
|
|
659b577e0e | ||
|
|
2b103a9ec7 | ||
|
|
a0805cad16 | ||
|
|
22ada6251f | ||
|
|
092b80ba55 | ||
|
|
3bd2e4a584 | ||
|
|
39dc3c0317 | ||
|
|
624b935ecc | ||
|
|
529f65674e |
@@ -9,4 +9,4 @@ exclude_dir = ["var", "vendor", "docs", "node_modules"]
|
||||
|
||||
[logger]
|
||||
time = true
|
||||
# to change flags at runtime, prepend with -- e.g. $ air -- --target http://localhost:3000 --difficulty 20 --use-remote-address
|
||||
# to change flags at runtime, prepend with -- e.g. $ air -- --target http://localhost:3000 --difficulty 20 --use-remote-address
|
||||
|
||||
12
.devcontainer/Dockerfile
Normal file
12
.devcontainer/Dockerfile
Normal file
@@ -0,0 +1,12 @@
|
||||
FROM ghcr.io/xe/devcontainer-base/pre/go
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY go.mod go.sum package.json package-lock.json ./
|
||||
RUN go install github.com/a-h/templ/cmd/templ \
|
||||
&& npx --yes playwright@1.52.0 install --with-deps\
|
||||
&& apt-get update \
|
||||
&& apt-get -y install zstd brotli \
|
||||
&& mkdir -p /home/vscode/.local/share/fish \
|
||||
&& chown -R vscode:vscode /home/vscode/.local/share/fish \
|
||||
&& chown -R vscode:vscode /go
|
||||
13
.devcontainer/README.md
Normal file
13
.devcontainer/README.md
Normal file
@@ -0,0 +1,13 @@
|
||||
# Anubis Dev Container
|
||||
|
||||
Anubis offers a [development container](https://containers.dev/) image in order to make it easier to contribute to the project. This image is based on [Xe/devcontainer-base/go](https://github.com/Xe/devcontainer-base/tree/main/src/go), which is based on Debian Bookworm with the following customizations:
|
||||
|
||||
- [Fish](https://fishshell.com/) as the shell complete with a custom theme
|
||||
- [Go](https://go.dev) at the most recent stable version
|
||||
- [Node.js](https://nodejs.org/en) at the most recent stable version
|
||||
- [Atuin](https://atuin.sh/) to sync shell history between your host OS and the development container
|
||||
- [Docker](https://docker.com) to manage and build Anubis container images from inside the development container
|
||||
- [Ko](https://ko.build/) to build production-ready Anubis container images
|
||||
- [Neovim](https://neovim.io/) for use with Git
|
||||
|
||||
This development container is tested and known to work with [Visual Studio Code](https://code.visualstudio.com/). If you run into problems with it outside of VS Code, please file an issue and let us know what editor you are using.
|
||||
34
.devcontainer/devcontainer.json
Normal file
34
.devcontainer/devcontainer.json
Normal file
@@ -0,0 +1,34 @@
|
||||
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
|
||||
// README at: https://github.com/devcontainers/templates/tree/main/src/debian
|
||||
{
|
||||
"name": "Dev",
|
||||
// Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
|
||||
"build": {
|
||||
"dockerfile": "./Dockerfile",
|
||||
"context": "..",
|
||||
"cacheFrom": [
|
||||
"type=registry,ref=ghcr.io/techarohq/anubis/devcontainer"
|
||||
]
|
||||
},
|
||||
"postStartCommand": "npm ci && go mod download",
|
||||
"features": {
|
||||
"ghcr.io/xe/devcontainer-features/ko:1.1.0": {}
|
||||
},
|
||||
"initializeCommand": "mkdir -p ${localEnv:HOME}${localEnv:USERPROFILE}/.local/share/atuin",
|
||||
"customizations": {
|
||||
"vscode": {
|
||||
"extensions": [
|
||||
"esbenp.prettier-vscode",
|
||||
"ms-azuretools.vscode-containers",
|
||||
"golang.go",
|
||||
"unifiedjs.vscode-mdx",
|
||||
"a-h.templ",
|
||||
"redhat.vscode-yaml"
|
||||
]
|
||||
}
|
||||
},
|
||||
"forwardPorts": [
|
||||
8923,
|
||||
3000
|
||||
]
|
||||
}
|
||||
2
.gitattributes
vendored
2
.gitattributes
vendored
@@ -1 +1 @@
|
||||
web/index_templ.go linguist-generated
|
||||
**/*_templ.go linguist-generated=true
|
||||
|
||||
17
.github/actions/spelling/README.md
vendored
Normal file
17
.github/actions/spelling/README.md
vendored
Normal file
@@ -0,0 +1,17 @@
|
||||
# check-spelling/check-spelling configuration
|
||||
|
||||
File | Purpose | Format | Info
|
||||
-|-|-|-
|
||||
[dictionary.txt](dictionary.txt) | Replacement dictionary (creating this file will override the default dictionary) | one word per line | [dictionary](https://github.com/check-spelling/check-spelling/wiki/Configuration#dictionary)
|
||||
[allow.txt](allow.txt) | Add words to the dictionary | one word per line (only letters and `'`s allowed) | [allow](https://github.com/check-spelling/check-spelling/wiki/Configuration#allow)
|
||||
[reject.txt](reject.txt) | Remove words from the dictionary (after allow) | grep pattern matching whole dictionary words | [reject](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples%3A-reject)
|
||||
[excludes.txt](excludes.txt) | Files to ignore entirely | perl regular expression | [excludes](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples%3A-excludes)
|
||||
[only.txt](only.txt) | Only check matching files (applied after excludes) | perl regular expression | [only](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples%3A-only)
|
||||
[patterns.txt](patterns.txt) | Patterns to ignore from checked lines | perl regular expression (order matters, first match wins) | [patterns](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples%3A-patterns)
|
||||
[candidate.patterns](candidate.patterns) | Patterns that might be worth adding to [patterns.txt](patterns.txt) | perl regular expression with optional comment block introductions (all matches will be suggested) | [candidates](https://github.com/check-spelling/check-spelling/wiki/Feature:-Suggest-patterns)
|
||||
[line_forbidden.patterns](line_forbidden.patterns) | Patterns to flag in checked lines | perl regular expression (order matters, first match wins) | [patterns](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples%3A-patterns)
|
||||
[expect.txt](expect.txt) | Expected words that aren't in the dictionary | one word per line (sorted, alphabetically) | [expect](https://github.com/check-spelling/check-spelling/wiki/Configuration#expect)
|
||||
[advice.md](advice.md) | Supplement for GitHub comment when unrecognized words are found | GitHub Markdown | [advice](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples%3A-advice)
|
||||
|
||||
Note: you can replace any of these files with a directory by the same name (minus the suffix)
|
||||
and then include multiple files inside that directory (with that suffix) to merge multiple files together.
|
||||
31
.github/actions/spelling/advice.md
vendored
Normal file
31
.github/actions/spelling/advice.md
vendored
Normal file
@@ -0,0 +1,31 @@
|
||||
<!-- See https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples%3A-advice --> <!-- markdownlint-disable MD033 MD041 -->
|
||||
<details><summary>If the flagged items are :exploding_head: false positives</summary>
|
||||
|
||||
If items relate to a ...
|
||||
* binary file (or some other file you wouldn't want to check at all).
|
||||
|
||||
Please add a file path to the `excludes.txt` file matching the containing file.
|
||||
|
||||
File paths are Perl 5 Regular Expressions - you can [test](
|
||||
https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your files.
|
||||
|
||||
`^` refers to the file's path from the root of the repository, so `^README\.md$` would exclude [README.md](
|
||||
../tree/HEAD/README.md) (on whichever branch you're using).
|
||||
|
||||
* well-formed pattern.
|
||||
|
||||
If you can write a [pattern](
|
||||
https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples:-patterns
|
||||
) that would match it,
|
||||
try adding it to the `patterns.txt` file.
|
||||
|
||||
Patterns are Perl 5 Regular Expressions - you can [test](
|
||||
https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your lines.
|
||||
|
||||
Note that patterns can't match multiline strings.
|
||||
|
||||
</details>
|
||||
|
||||
<!-- adoption information-->
|
||||
:steam_locomotive: If you're seeing this message and your PR is from a branch that doesn't have check-spelling,
|
||||
please merge to your PR's base branch to get the version configured for your repository.
|
||||
5
.github/actions/spelling/allow.txt
vendored
Normal file
5
.github/actions/spelling/allow.txt
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
github
|
||||
https
|
||||
ssh
|
||||
ubuntu
|
||||
workarounds
|
||||
779
.github/actions/spelling/candidate.patterns
vendored
Normal file
779
.github/actions/spelling/candidate.patterns
vendored
Normal file
@@ -0,0 +1,779 @@
|
||||
# Repeated letters
|
||||
#\b([a-z])\g{-1}{2,}\b
|
||||
|
||||
# marker to ignore all code on line
|
||||
^.*/\* #no-spell-check-line \*/.*$
|
||||
# marker to ignore all code on line
|
||||
^.*\bno-spell-check(?:-line|)(?:\s.*|)$
|
||||
|
||||
# https://cspell.org/configuration/document-settings/
|
||||
# cspell inline
|
||||
^.*\b[Cc][Ss][Pp][Ee][Ll]{2}:\s*[Dd][Ii][Ss][Aa][Bb][Ll][Ee]-[Ll][Ii][Nn][Ee]\b
|
||||
|
||||
# copyright
|
||||
Copyright (?:\([Cc]\)|)(?:[-\d, ]|and)+(?: [A-Z][a-z]+ [A-Z][a-z]+,?)+
|
||||
|
||||
# patch hunk comments
|
||||
^@@ -\d+(?:,\d+|) \+\d+(?:,\d+|) @@ .*
|
||||
# git index header
|
||||
index (?:[0-9a-z]{7,40},|)[0-9a-z]{7,40}\.\.[0-9a-z]{7,40}
|
||||
|
||||
# file permissions
|
||||
['"`\s][-bcdLlpsw](?:[-r][-w][-Ssx]){2}[-r][-w][-SsTtx]\+?['"`\s]
|
||||
|
||||
# css fonts
|
||||
\bfont(?:-family|):[^;}]+
|
||||
|
||||
# css url wrappings
|
||||
\burl\([^)]+\)
|
||||
|
||||
# cid urls
|
||||
(['"])cid:.*?\g{-1}
|
||||
|
||||
# data url in parens
|
||||
\(data:(?:[^) ][^)]*?|)(?:[A-Z]{3,}|[A-Z][a-z]{2,}|[a-z]{3,})[^)]*\)
|
||||
# data url in quotes
|
||||
([`'"])data:(?:[^ `'"].*?|)(?:[A-Z]{3,}|[A-Z][a-z]{2,}|[a-z]{3,}).*\g{-1}
|
||||
# data url
|
||||
\bdata:[-a-zA-Z=;:/0-9+]*,\S*
|
||||
|
||||
# https/http/file urls
|
||||
(?:\b(?:https?|ftp|file)://)[-A-Za-z0-9+&@#/*%?=~_|!:,.;]+[-A-Za-z0-9+&@#/*%=~_|]
|
||||
|
||||
# mailto urls
|
||||
mailto:[-a-zA-Z=;:/?%&0-9+@._]{3,}
|
||||
|
||||
# magnet urls
|
||||
magnet:[?=:\w]+
|
||||
|
||||
# magnet urls
|
||||
"magnet:[^"]+"
|
||||
|
||||
# obs:
|
||||
"obs:[^"]*"
|
||||
|
||||
# The `\b` here means a break, it's the fancy way to handle urls, but it makes things harder to read
|
||||
# In this examples content, I'm using a number of different ways to match things to show various approaches
|
||||
# asciinema
|
||||
\basciinema\.org/a/[0-9a-zA-Z]+
|
||||
|
||||
# asciinema v2
|
||||
^\[\d+\.\d+, "[io]", ".*"\]$
|
||||
|
||||
# apple
|
||||
\bdeveloper\.apple\.com/[-\w?=/]+
|
||||
# Apple music
|
||||
\bembed\.music\.apple\.com/fr/playlist/usr-share/[-\w.]+
|
||||
|
||||
# appveyor api
|
||||
\bci\.appveyor\.com/api/projects/status/[0-9a-z]+
|
||||
# appveyor project
|
||||
\bci\.appveyor\.com/project/(?:[^/\s"]*/){2}builds?/\d+/job/[0-9a-z]+
|
||||
|
||||
# Amazon
|
||||
|
||||
# Amazon
|
||||
\bamazon\.com/[-\w]+/(?:dp/[0-9A-Z]+|)
|
||||
# AWS ARN
|
||||
arn:aws:[-/:\w]+
|
||||
# AWS S3
|
||||
\b\w*\.s3[^.]*\.amazonaws\.com/[-\w/&#%_?:=]*
|
||||
# AWS execute-api
|
||||
\b[0-9a-z]{10}\.execute-api\.[-0-9a-z]+\.amazonaws\.com\b
|
||||
# AWS ELB
|
||||
\b\w+\.[-0-9a-z]+\.elb\.amazonaws\.com\b
|
||||
# AWS SNS
|
||||
\bsns\.[-0-9a-z]+.amazonaws\.com/[-\w/&#%_?:=]*
|
||||
# AWS VPC
|
||||
vpc-\w+
|
||||
|
||||
# While you could try to match `http://` and `https://` by using `s?` in `https?://`, sometimes there
|
||||
# YouTube url
|
||||
\b(?:(?:www\.|)youtube\.com|youtu.be)/(?:channel/|embed/|user/|playlist\?list=|watch\?v=|v/|)[-a-zA-Z0-9?&=_%]*
|
||||
# YouTube music
|
||||
\bmusic\.youtube\.com/youtubei/v1/browse(?:[?&]\w+=[-a-zA-Z0-9?&=_]*)
|
||||
# YouTube tag
|
||||
<\s*youtube\s+id=['"][-a-zA-Z0-9?_]*['"]
|
||||
# YouTube image
|
||||
\bimg\.youtube\.com/vi/[-a-zA-Z0-9?&=_]*
|
||||
# Google Accounts
|
||||
\baccounts.google.com/[-_/?=.:;+%&0-9a-zA-Z]*
|
||||
# Google Analytics
|
||||
\bgoogle-analytics\.com/collect.[-0-9a-zA-Z?%=&_.~]*
|
||||
# Google APIs
|
||||
\bgoogleapis\.(?:com|dev)/[a-z]+/(?:v\d+/|)[a-z]+/[-@:./?=\w+|&]+
|
||||
# Google Artifact Registry
|
||||
\.pkg\.dev(?:/[-\w]+)+(?::[-\w]+|)
|
||||
# Google Storage
|
||||
\b[-a-zA-Z0-9.]*\bstorage\d*\.googleapis\.com(?:/\S*|)
|
||||
# Google Calendar
|
||||
\bcalendar\.google\.com/calendar(?:/u/\d+|)/embed\?src=[@./?=\w&%]+
|
||||
\w+\@group\.calendar\.google\.com\b
|
||||
# Google DataStudio
|
||||
\bdatastudio\.google\.com/(?:(?:c/|)u/\d+/|)(?:embed/|)(?:open|reporting|datasources|s)/[-0-9a-zA-Z]+(?:/page/[-0-9a-zA-Z]+|)
|
||||
# The leading `/` here is as opposed to the `\b` above
|
||||
# ... a short way to match `https://` or `http://` since most urls have one of those prefixes
|
||||
# Google Docs
|
||||
/docs\.google\.com/[a-z]+/(?:ccc\?key=\w+|(?:u/\d+|d/(?:e/|)[0-9a-zA-Z_-]+/)?(?:edit\?[-\w=#.]*|/\?[\w=&]*|))
|
||||
# Google Drive
|
||||
\bdrive\.google\.com/(?:file/d/|open)[-0-9a-zA-Z_?=]*
|
||||
# Google Groups
|
||||
\bgroups\.google\.com(?:/[a-z]+/(?:#!|)[^/\s"]+)*
|
||||
# Google Maps
|
||||
\bmaps\.google\.com/maps\?[\w&;=]*
|
||||
# Google themes
|
||||
themes\.googleusercontent\.com/static/fonts/[^/\s"]+/v\d+/[^.]+.
|
||||
# Google CDN
|
||||
\bclients2\.google(?:usercontent|)\.com[-0-9a-zA-Z/.]*
|
||||
# Goo.gl
|
||||
/goo\.gl/[a-zA-Z0-9]+
|
||||
# Google Chrome Store
|
||||
\bchrome\.google\.com/webstore/detail/[-\w]*(?:/\w*|)
|
||||
# Google Books
|
||||
\bgoogle\.(?:\w{2,4})/books(?:/\w+)*\?[-\w\d=&#.]*
|
||||
# Google Fonts
|
||||
\bfonts\.(?:googleapis|gstatic)\.com/[-/?=:;+&0-9a-zA-Z]*
|
||||
# Google Forms
|
||||
\bforms\.gle/\w+
|
||||
# Google Scholar
|
||||
\bscholar\.google\.com/citations\?user=[A-Za-z0-9_]+
|
||||
# Google Colab Research Drive
|
||||
\bcolab\.research\.google\.com/drive/[-0-9a-zA-Z_?=]*
|
||||
# Google Cloud regions
|
||||
(?:us|(?:north|south)america|europe|asia|australia|me|africa)-(?:north|south|east|west|central){1,2}\d+
|
||||
|
||||
# GitHub SHAs (api)
|
||||
\bapi.github\.com/repos(?:/[^/\s"]+){3}/[0-9a-f]+\b
|
||||
# GitHub SHAs (markdown)
|
||||
(?:\[`?[0-9a-f]+`?\]\(https:/|)/(?:www\.|)github\.com(?:/[^/\s"]+){2,}(?:/[^/\s")]+)(?:[0-9a-f]+(?:[-0-9a-zA-Z/#.]*|)\b|)
|
||||
# GitHub SHAs
|
||||
\bgithub\.com(?:/[^/\s"]+){2}[@#][0-9a-f]+\b
|
||||
# GitHub SHA refs
|
||||
\[([0-9a-f]+)\]\(https://(?:www\.|)github.com/[-\w]+/[-\w]+/commit/\g{-1}[0-9a-f]*
|
||||
# GitHub wiki
|
||||
\bgithub\.com/(?:[^/]+/){2}wiki/(?:(?:[^/]+/|)_history|[^/]+(?:/_compare|)/[0-9a-f.]{40,})\b
|
||||
# githubusercontent
|
||||
/[-a-z0-9]+\.githubusercontent\.com/[-a-zA-Z0-9?&=_\/.]*
|
||||
# githubassets
|
||||
\bgithubassets.com/[0-9a-f]+(?:[-/\w.]+)
|
||||
# gist github
|
||||
\bgist\.github\.com/[^/\s"]+/[0-9a-f]+
|
||||
# git.io
|
||||
\bgit\.io/[0-9a-zA-Z]+
|
||||
# GitHub JSON
|
||||
"node_id": "[-a-zA-Z=;:/0-9+_]*"
|
||||
# Contributor
|
||||
\[[^\]]+\]\(https://github\.com/[^/\s"]+/?\)
|
||||
# GHSA
|
||||
GHSA(?:-[0-9a-z]{4}){3}
|
||||
|
||||
# GitHub actions
|
||||
\buses:\s+[-\w.]+/[-\w./]+@[-\w.]+
|
||||
|
||||
# GitLab commit
|
||||
\bgitlab\.[^/\s"]*/\S+/\S+/commit/[0-9a-f]{7,16}#[0-9a-f]{40}\b
|
||||
# GitLab merge requests
|
||||
\bgitlab\.[^/\s"]*/\S+/\S+/-/merge_requests/\d+/diffs#[0-9a-f]{40}\b
|
||||
# GitLab uploads
|
||||
\bgitlab\.[^/\s"]*/uploads/[-a-zA-Z=;:/0-9+]*
|
||||
# GitLab commits
|
||||
\bgitlab\.[^/\s"]*/(?:[^/\s"]+/){2}commits?/[0-9a-f]+\b
|
||||
|
||||
# #includes
|
||||
^\s*#include\s*(?:<.*?>|".*?")
|
||||
|
||||
# #pragma lib
|
||||
^\s*#pragma comment\(lib, ".*?"\)
|
||||
|
||||
# binance
|
||||
accounts\.binance\.com/[a-z/]*oauth/authorize\?[-0-9a-zA-Z&%]*
|
||||
|
||||
# bitbucket diff
|
||||
\bapi\.bitbucket\.org/\d+\.\d+/repositories/(?:[^/\s"]+/){2}diff(?:stat|)(?:/[^/\s"]+){2}:[0-9a-f]+
|
||||
# bitbucket repositories commits
|
||||
\bapi\.bitbucket\.org/\d+\.\d+/repositories/(?:[^/\s"]+/){2}commits?/[0-9a-f]+
|
||||
# bitbucket commits
|
||||
\bbitbucket\.org/(?:[^/\s"]+/){2}commits?/[0-9a-f]+
|
||||
|
||||
# bit.ly
|
||||
\bbit\.ly/\w+
|
||||
|
||||
# bitrise
|
||||
\bapp\.bitrise\.io/app/[0-9a-f]*/[\w.?=&]*
|
||||
|
||||
# bootstrapcdn.com
|
||||
\bbootstrapcdn\.com/[-./\w]+
|
||||
|
||||
# cdn.cloudflare.com
|
||||
\bcdnjs\.cloudflare\.com/[./\w]+
|
||||
|
||||
# circleci
|
||||
\bcircleci\.com/gh(?:/[^/\s"]+){1,5}.[a-z]+\?[-0-9a-zA-Z=&]+
|
||||
|
||||
# gitter
|
||||
\bgitter\.im(?:/[^/\s"]+){2}\?at=[0-9a-f]+
|
||||
|
||||
# gravatar
|
||||
\bgravatar\.com/avatar/[0-9a-f]+
|
||||
|
||||
# ibm
|
||||
[a-z.]*ibm\.com/[-_#=:%!?~.\\/\d\w]*
|
||||
|
||||
# imgur
|
||||
\bimgur\.com/[^.]+
|
||||
|
||||
# Internet Archive
|
||||
\barchive\.org/web/\d+/(?:[-\w.?,'/\\+&%$#_:]*)
|
||||
|
||||
# discord
|
||||
/discord(?:app\.com|\.gg)/(?:invite/)?[a-zA-Z0-9]{7,}
|
||||
|
||||
# Disqus
|
||||
\bdisqus\.com/[-\w/%.()!?&=_]*
|
||||
|
||||
# medium link
|
||||
\blink\.medium\.com/[a-zA-Z0-9]+
|
||||
# medium
|
||||
\bmedium\.com/@?[^/\s"]+/[-\w]+
|
||||
|
||||
# microsoft
|
||||
\b(?:https?://|)(?:(?:(?:blogs|download\.visualstudio|docs|msdn2?|research)\.|)microsoft|blogs\.msdn)\.co(?:m|\.\w\w)/[-_a-zA-Z0-9()=./%]*
|
||||
# powerbi
|
||||
\bapp\.powerbi\.com/reportEmbed/[^"' ]*
|
||||
# vs devops
|
||||
\bvisualstudio.com(?::443|)/[-\w/?=%&.]*
|
||||
# microsoft store
|
||||
\bmicrosoft\.com/store/apps/\w+
|
||||
|
||||
# mvnrepository.com
|
||||
\bmvnrepository\.com/[-0-9a-z./]+
|
||||
|
||||
# now.sh
|
||||
/[0-9a-z-.]+\.now\.sh\b
|
||||
|
||||
# oracle
|
||||
\bdocs\.oracle\.com/[-0-9a-zA-Z./_?#&=]*
|
||||
|
||||
# chromatic.com
|
||||
/\S+.chromatic.com\S*[")]
|
||||
|
||||
# codacy
|
||||
\bapi\.codacy\.com/project/badge/Grade/[0-9a-f]+
|
||||
|
||||
# compai
|
||||
\bcompai\.pub/v1/png/[0-9a-f]+
|
||||
|
||||
# mailgun api
|
||||
\.api\.mailgun\.net/v3/domains/[0-9a-z]+\.mailgun.org/messages/[0-9a-zA-Z=@]*
|
||||
# mailgun
|
||||
\b[0-9a-z]+.mailgun.org
|
||||
|
||||
# /message-id/
|
||||
/message-id/[-\w@./%]+
|
||||
|
||||
# Reddit
|
||||
\breddit\.com/r/[/\w_]*
|
||||
|
||||
# requestb.in
|
||||
\brequestb\.in/[0-9a-z]+
|
||||
|
||||
# sched
|
||||
\b[a-z0-9]+\.sched\.com\b
|
||||
|
||||
# Slack url
|
||||
slack://[a-zA-Z0-9?&=]+
|
||||
# Slack
|
||||
\bslack\.com/[-0-9a-zA-Z/_~?&=.]*
|
||||
# Slack edge
|
||||
\bslack-edge\.com/[-a-zA-Z0-9?&=%./]+
|
||||
# Slack images
|
||||
\bslack-imgs\.com/[-a-zA-Z0-9?&=%.]+
|
||||
|
||||
# shields.io
|
||||
\bshields\.io/[-\w/%?=&.:+;,]*
|
||||
|
||||
# stackexchange -- https://stackexchange.com/feeds/sites
|
||||
\b(?:askubuntu|serverfault|stack(?:exchange|overflow)|superuser).com/(?:questions/\w+/[-\w]+|a/)
|
||||
|
||||
# Sentry
|
||||
[0-9a-f]{32}\@o\d+\.ingest\.sentry\.io\b
|
||||
|
||||
# Twitter markdown
|
||||
\[@[^[/\]:]*?\]\(https://twitter.com/[^/\s"')]*(?:/status/\d+(?:\?[-_0-9a-zA-Z&=]*|)|)\)
|
||||
# Twitter hashtag
|
||||
\btwitter\.com/hashtag/[\w?_=&]*
|
||||
# Twitter status
|
||||
\btwitter\.com/[^/\s"')]*(?:/status/\d+(?:\?[-_0-9a-zA-Z&=]*|)|)
|
||||
# Twitter profile images
|
||||
\btwimg\.com/profile_images/[_\w./]*
|
||||
# Twitter media
|
||||
\btwimg\.com/media/[-_\w./?=]*
|
||||
# Twitter link shortened
|
||||
\bt\.co/\w+
|
||||
|
||||
# facebook
|
||||
\bfburl\.com/[0-9a-z_]+
|
||||
# facebook CDN
|
||||
\bfbcdn\.net/[\w/.,]*
|
||||
# facebook watch
|
||||
\bfb\.watch/[0-9A-Za-z]+
|
||||
|
||||
# dropbox
|
||||
\bdropbox\.com/sh?/[^/\s"]+/[-0-9A-Za-z_.%?=&;]+
|
||||
|
||||
# ipfs protocol
|
||||
ipfs://[0-9a-zA-Z]{3,}
|
||||
# ipfs url
|
||||
/ipfs/[0-9a-zA-Z]{3,}
|
||||
|
||||
# w3
|
||||
\bw3\.org/[-0-9a-zA-Z/#.]+
|
||||
|
||||
# loom
|
||||
\bloom\.com/embed/[0-9a-f]+
|
||||
|
||||
# regex101
|
||||
\bregex101\.com/r/[^/\s"]+/\d+
|
||||
|
||||
# figma
|
||||
\bfigma\.com/file(?:/[0-9a-zA-Z]+/)+
|
||||
|
||||
# freecodecamp.org
|
||||
\bfreecodecamp\.org/[-\w/.]+
|
||||
|
||||
# image.tmdb.org
|
||||
\bimage\.tmdb\.org/[/\w.]+
|
||||
|
||||
# mermaid
|
||||
\bmermaid\.ink/img/[-\w]+|\bmermaid-js\.github\.io/mermaid-live-editor/#/edit/[-\w]+
|
||||
|
||||
# Wikipedia
|
||||
\ben\.wikipedia\.org/wiki/[-\w%.#]+
|
||||
|
||||
# gitweb
|
||||
[^"\s]+/gitweb/\S+;h=[0-9a-f]+
|
||||
|
||||
# HyperKitty lists
|
||||
/archives/list/[^@/]+@[^/\s"]*/message/[^/\s"]*/
|
||||
|
||||
# lists
|
||||
/thread\.html/[^"\s]+
|
||||
|
||||
# list-management
|
||||
\blist-manage\.com/subscribe(?:[?&](?:u|id)=[0-9a-f]+)+
|
||||
|
||||
# kubectl.kubernetes.io/last-applied-configuration
|
||||
"kubectl.kubernetes.io/last-applied-configuration": ".*"
|
||||
|
||||
# pgp
|
||||
\bgnupg\.net/pks/lookup[?&=0-9a-zA-Z]*
|
||||
|
||||
# Spotify
|
||||
\bopen\.spotify\.com/embed/playlist/\w+
|
||||
|
||||
# Mastodon
|
||||
\bmastodon\.[-a-z.]*/(?:media/|@)[?&=0-9a-zA-Z_]*
|
||||
|
||||
# scastie
|
||||
\bscastie\.scala-lang\.org/[^/]+/\w+
|
||||
|
||||
# images.unsplash.com
|
||||
\bimages\.unsplash\.com/(?:(?:flagged|reserve)/|)[-\w./%?=%&.;]+
|
||||
|
||||
# pastebin
|
||||
\bpastebin\.com/[\w/]+
|
||||
|
||||
# heroku
|
||||
\b\w+\.heroku\.com/source/archive/\w+
|
||||
|
||||
# quip
|
||||
\b\w+\.quip\.com/\w+(?:(?:#|/issues/)\w+)?
|
||||
|
||||
# badgen.net
|
||||
\bbadgen\.net/badge/[^")\]'\s]+
|
||||
|
||||
# statuspage.io
|
||||
\w+\.statuspage\.io\b
|
||||
|
||||
# media.giphy.com
|
||||
\bmedia\.giphy\.com/media/[^/]+/[\w.?&=]+
|
||||
|
||||
# tinyurl
|
||||
\btinyurl\.com/\w+
|
||||
|
||||
# codepen
|
||||
\bcodepen\.io/[\w/]+
|
||||
|
||||
# registry.npmjs.org
|
||||
\bregistry\.npmjs\.org/(?:@[^/"']+/|)[^/"']+/-/[-\w@.]+
|
||||
|
||||
# getopts
|
||||
\bgetopts\s+(?:"[^"]+"|'[^']+')
|
||||
|
||||
# ANSI color codes
|
||||
(?:\\(?:u00|x)1[Bb]|\\03[1-7]|\x1b|\\u\{1[Bb]\})\[\d+(?:;\d+)*m
|
||||
|
||||
# URL escaped characters
|
||||
%[0-9A-F][A-F](?=[A-Za-z])
|
||||
# lower URL escaped characters
|
||||
%[0-9a-f][a-f](?=[a-z]{2,})
|
||||
# IPv6
|
||||
\b(?:[0-9a-fA-F]{0,4}:){3,7}[0-9a-fA-F]{0,4}\b
|
||||
# c99 hex digits (not the full format, just one I've seen)
|
||||
0x[0-9a-fA-F](?:\.[0-9a-fA-F]*|)[pP]
|
||||
# Punycode
|
||||
\bxn--[-0-9a-z]+
|
||||
# sha
|
||||
sha\d+:[0-9a-f]*?[a-f]{3,}[0-9a-f]*
|
||||
# sha-... -- uses a fancy capture
|
||||
(\\?['"]|")[0-9a-f]{40,}\g{-1}
|
||||
# hex runs
|
||||
\b[0-9a-fA-F]{16,}\b
|
||||
# hex in url queries
|
||||
=[0-9a-fA-F]*?(?:[A-F]{3,}|[a-f]{3,})[0-9a-fA-F]*?&
|
||||
# ssh
|
||||
(?:ssh-\S+|-nistp256) [-a-zA-Z=;:/0-9+]{12,}
|
||||
|
||||
# PGP
|
||||
\b(?:[0-9A-F]{4} ){9}[0-9A-F]{4}\b
|
||||
# GPG keys
|
||||
\b(?:[0-9A-F]{4} ){5}(?: [0-9A-F]{4}){5}\b
|
||||
# Well known gpg keys
|
||||
.well-known/openpgpkey/[\w./]+
|
||||
|
||||
# pki
|
||||
-----BEGIN.*-----END
|
||||
|
||||
# pki (base64)
|
||||
LS0tLS1CRUdJT.*
|
||||
|
||||
# C# includes
|
||||
^\s*using [^;]+;
|
||||
|
||||
# uuid:
|
||||
\b[0-9a-fA-F]{8}-(?:[0-9a-fA-F]{4}-){3}[0-9a-fA-F]{12}\b
|
||||
# hex digits including css/html color classes:
|
||||
(?:[\\0][xX]|\\u|[uU]\+|#x?|%23|&H)[0-9_a-fA-FgGrR]*?[a-fA-FgGrR]{2,}[0-9_a-fA-FgGrR]*(?:[uUlL]{0,3}|[iu]\d+)\b
|
||||
|
||||
# integrity
|
||||
integrity=(['"])(?:\s*sha\d+-[-a-zA-Z=;:/0-9+]{40,})+\g{-1}
|
||||
|
||||
# https://www.gnu.org/software/groff/manual/groff.html
|
||||
# man troff content
|
||||
\\f[BCIPR]
|
||||
# '/"
|
||||
\\\([ad]q
|
||||
|
||||
# .desktop mime types
|
||||
^MimeTypes?=.*$
|
||||
# .desktop localized entries
|
||||
^[A-Z][a-z]+\[[a-z]+\]=.*$
|
||||
# Localized .desktop content
|
||||
Name\[[^\]]+\]=.*
|
||||
|
||||
# IServiceProvider / isAThing
|
||||
(?:(?:\b|_|(?<=[a-z]))I|(?:\b|_)(?:nsI|isA))(?=(?:[A-Z][a-z]{2,})+(?:[A-Z\d]|\b))
|
||||
|
||||
# crypt
|
||||
(['"])\$2[ayb]\$.{56}\g{-1}
|
||||
|
||||
# apache/old crypt
|
||||
(['"]|)\$+(?:apr|)1\$+.{8}\$+.{22}\g{-1}
|
||||
|
||||
# sha1 hash
|
||||
\{SHA\}[-a-zA-Z=;:/0-9+]{3,}
|
||||
|
||||
# machine learning (?)
|
||||
\b(?i)ml(?=[a-z]{2,})
|
||||
|
||||
# python
|
||||
#\b(?i)py(?!gments|gmy|lon|ramid|ro|th)(?=[a-z]{2,})
|
||||
|
||||
# scrypt / argon
|
||||
\$(?:scrypt|argon\d+[di]*)\$\S+
|
||||
|
||||
# go.sum
|
||||
\bh1:\S+
|
||||
|
||||
# imports
|
||||
^import\s+(?:(?:static|type)\s+|)(?:[\w.]|\{\s*\w*?(?:,\s*(?:\w*|\*))+\s*\})+
|
||||
|
||||
# scala modules
|
||||
("[^"]+"\s*%%?\s*){2,3}"[^"]+"
|
||||
|
||||
# container images
|
||||
image: [-\w./:@]+
|
||||
|
||||
# Docker images
|
||||
^\s*(?i)FROM\s+\S+:\S+(?:\s+AS\s+\S+|)
|
||||
|
||||
# `docker images` REPOSITORY TAG IMAGE ID CREATED SIZE
|
||||
\s*\S+/\S+\s+\S+\s+[0-9a-f]{8,}\s+\d+\s+(?:hour|day|week)s ago\s+[\d.]+[KMGT]B
|
||||
|
||||
# Intel intrinsics
|
||||
_mm_(?!dd)\w+
|
||||
|
||||
# Input to GitHub JSON
|
||||
content: (['"])[-a-zA-Z=;:/0-9+]*=\g{-1}
|
||||
|
||||
# This does not cover multiline strings, if your repository has them,
|
||||
# you'll want to remove the `(?=.*?")` suffix.
|
||||
# The `(?=.*?")` suffix should limit the false positives rate
|
||||
# printf
|
||||
%(?:(?:(?:hh?|ll?|[jzt])?[diuoxn]|l?[cs]|L?[fega]|p)(?=[a-z]{2,})|(?:X|L?[FEGA])(?=[a-zA-Z]{2,}))(?!%)(?=[_a-zA-Z]+(?!%)\b)(?=.*?['"])
|
||||
|
||||
# Alternative printf
|
||||
# %s
|
||||
%(?:s(?=[a-z]{2,}))(?!%)(?=[_a-zA-Z]+(?!%[^s])\b)(?=.*?['"])
|
||||
|
||||
# Python string prefix / binary prefix
|
||||
# Note that there's a high false positive rate, remove the `?=` and search for the regex to see if the matches seem like reasonable strings
|
||||
(?<!['"])\b(?:B|BR|Br|F|FR|Fr|R|RB|RF|Rb|Rf|U|UR|Ur|b|bR|br|f|fR|fr|r|rB|rF|rb|rf|u|uR|ur)['"](?=[A-Z]{3,}|[A-Z][a-z]{2,}|[a-z]{3,})
|
||||
|
||||
# Regular expressions for (P|p)assword
|
||||
\([A-Z]\|[a-z]\)[a-z]+
|
||||
|
||||
# JavaScript regular expressions
|
||||
# javascript test regex
|
||||
/.{3,}/[gim]*\.test\(
|
||||
# javascript match regex
|
||||
\.match\(/[^/\s"]{3,}/[gim]*\s*
|
||||
# javascript match regex
|
||||
\.match\(/\\[b].{3,}?/[gim]*\s*\)(?:;|$)
|
||||
# javascript regex
|
||||
^\s*/\\[b].{3,}?/[gim]*\s*(?:\)(?:;|$)|,$)
|
||||
# javascript replace regex
|
||||
\.replace\(/[^/\s"]{3,}/[gim]*\s*,
|
||||
# assign regex
|
||||
= /[^*].*?(?:[a-z]{3,}|[A-Z]{3,}|[A-Z][a-z]{2,}).*/[gim]*(?=\W|$)
|
||||
# perl regex test
|
||||
[!=]~ (?:/.*/|m\{.*?\}|m<.*?>|m([|!/@#,;']).*?\g{-1})
|
||||
|
||||
# perl qr regex
|
||||
(?<!\$)\bqr(?:\{.*?\}|<.*?>|\(.*?\)|([|!/@#,;']).*?\g{-1})
|
||||
|
||||
# perl run
|
||||
perl(?:\s+-[a-zA-Z]\w*)+
|
||||
|
||||
# C network byte conversions
|
||||
(?:\d|\bh)to(?!ken)(?=[a-z])|to(?=[adhiklpun]\()
|
||||
|
||||
# Go regular expressions
|
||||
regexp?\.MustCompile\((?:`[^`]*`|".*"|'.*')\)
|
||||
|
||||
# regex choice
|
||||
\(\?:[^)]+\|[^)]+\)
|
||||
|
||||
# proto
|
||||
^\s*(\w+)\s\g{-1} =
|
||||
|
||||
# sed regular expressions
|
||||
sed 's/(?:[^/]*?[a-zA-Z]{3,}[^/]*?/){2}
|
||||
|
||||
# node packages
|
||||
(["'])@[^/'" ]+/[^/'" ]+\g{-1}
|
||||
|
||||
# go install
|
||||
go install(?:\s+[a-z]+\.[-@\w/.]+)+
|
||||
|
||||
# pom.xml
|
||||
<(?:group|artifact)Id>.*?<
|
||||
|
||||
# jetbrains schema https://youtrack.jetbrains.com/issue/RSRP-489571
|
||||
urn:shemas-jetbrains-com
|
||||
|
||||
# Debian changelog severity
|
||||
[-\w]+ \(.*\) (?:\w+|baseline|unstable|experimental); urgency=(?:low|medium|high|emergency|critical)\b
|
||||
|
||||
# kubernetes pod status lists
|
||||
# https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase
|
||||
\w+(?:-\w+)+\s+\d+/\d+\s+(?:Running|Pending|Succeeded|Failed|Unknown)\s+
|
||||
|
||||
# kubectl - pods in CrashLoopBackOff
|
||||
\w+-[0-9a-f]+-\w+\s+\d+/\d+\s+CrashLoopBackOff\s+
|
||||
|
||||
# kubernetes applications
|
||||
\.apps/[-\w]+
|
||||
|
||||
# kubernetes object suffix
|
||||
-[0-9a-f]{10}-\w{5}\s
|
||||
|
||||
# kubernetes crd patterns
|
||||
^\s*pattern: .*$
|
||||
|
||||
# posthog secrets
|
||||
([`'"])phc_[^"',]+\g{-1}
|
||||
|
||||
# xcode
|
||||
|
||||
# xcodeproject scenes
|
||||
(?:Controller|destination|(?:first|second)Item|ID|id)="\w{3}-\w{2}-\w{3}"
|
||||
|
||||
# xcode api botches
|
||||
customObjectInstantitationMethod
|
||||
|
||||
# msvc api botches
|
||||
PrependWithABINamepsace
|
||||
|
||||
# configure flags
|
||||
.* \| --\w{2,}.*?(?=\w+\s\w+)
|
||||
|
||||
# font awesome classes
|
||||
\.fa-[-a-z0-9]+
|
||||
|
||||
# bearer auth
|
||||
(['"])[Bb]ear[e][r] .{3,}?\g{-1}
|
||||
|
||||
# bearer auth
|
||||
\b[Bb]ear[e][r]:? [-a-zA-Z=;:/0-9+.]{3,}
|
||||
|
||||
# basic auth
|
||||
(['"])[Bb]asic [-a-zA-Z=;:/0-9+]{3,}\g{-1}
|
||||
|
||||
# basic auth
|
||||
: [Bb]asic [-a-zA-Z=;:/0-9+.]{3,}
|
||||
|
||||
# base64 encoded content
|
||||
([`'"])[-a-zA-Z=;:/0-9+]{3,}=\g{-1}
|
||||
# base64 encoded content in xml/sgml
|
||||
>[-a-zA-Z=;:/0-9+]{3,}=</
|
||||
# base64 encoded content, possibly wrapped in mime
|
||||
#(?:^|[\s=;:?])[-a-zA-Z=;:/0-9+]{50,}(?:[\s=;:?]|$)
|
||||
# base64 encoded json
|
||||
\beyJ[-a-zA-Z=;:/0-9+]+
|
||||
# base64 encoded pkcs
|
||||
\bMII[-a-zA-Z=;:/0-9+]+
|
||||
|
||||
# uuencoded
|
||||
#[!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_]{40,}
|
||||
|
||||
# DNS rr data
|
||||
(?:\d+\s+){3}(?:[-+/=.\w]{2,}\s*){1,2}
|
||||
|
||||
# encoded-word
|
||||
=\?[-a-zA-Z0-9"*%]+\?[BQ]\?[^?]{0,75}\?=
|
||||
|
||||
# numerator
|
||||
\bnumer\b(?=.*denom)
|
||||
|
||||
# Time Zones
|
||||
\b(?:Africa|Atlantic|America|Antarctica|Arctic|Asia|Australia|Europe|Indian|Pacific)(?:/[-\w]+)+
|
||||
|
||||
# linux kernel info
|
||||
^(?:bugs|flags|Features)\s+:.*
|
||||
|
||||
# systemd mode
|
||||
systemd.*?running in system mode \([-+].*\)$
|
||||
|
||||
# Lorem
|
||||
# Update Lorem based on your content (requires `ge` and `w` from https://github.com/jsoref/spelling; and `review` from https://github.com/check-spelling/check-spelling/wiki/Looking-for-items-locally )
|
||||
# grep '^[^#].*lorem' .github/actions/spelling/patterns.txt|perl -pne 's/.*i..\?://;s/\).*//' |tr '|' "\n"|sort -f |xargs -n1 ge|perl -pne 's/^[^:]*://'|sort -u|w|sed -e 's/ .*//'|w|review -
|
||||
# Warning, while `(?i)` is very neat and fancy, if you have some binary files that aren't proper unicode, you might run into:
|
||||
# ... Operation "substitution (s///)" returns its argument for non-Unicode code point 0x1C19AE (the code point will vary).
|
||||
# ... You could manually change `(?i)X...` to use `[Xx]...`
|
||||
# ... or you could add the files to your `excludes` file (a version after 0.0.19 should identify the file path)
|
||||
(?:(?:\w|\s|[,.])*\b(?i)(?:amet|consectetur|cursus|dolor|eros|ipsum|lacus|libero|ligula|lorem|magna|neque|nulla|suscipit|tempus)\b(?:\w|\s|[,.])*)
|
||||
|
||||
# Non-English
|
||||
# Even repositories expecting pure English content can unintentionally have Non-English content... People will occasionally mistakenly enter [homoglyphs](https://en.wikipedia.org/wiki/Homoglyph) which are essentially typos, and using this pattern will mean check-spelling will not complain about them.
|
||||
#
|
||||
# If the content to be checked should be written in English and the only Non-English items will be people's names, then you can consider adding this.
|
||||
#
|
||||
# Alternatively, if you're using check-spelling v0.0.25+, and you would like to _check_ the Non-English content for spelling errors, you can. For information on how to do so, see:
|
||||
# https://docs.check-spelling.dev/Feature:-Configurable-word-characters.html#unicode
|
||||
[a-zA-Z]*[ÀÁÂÃÄÅÆČÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæčçèéêëìíîïðñòóôõöøùúûüýÿĀāŁłŃńŅņŒœŚśŠšŜŝŸŽžź][a-zA-Z]{3}[a-zA-ZÀÁÂÃÄÅÆČÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæčçèéêëìíîïðñòóôõöøùúûüýÿĀāŁłŃńŅņŒœŚśŠšŜŝŸŽžź]*|[a-zA-Z]{3,}[ÀÁÂÃÄÅÆČÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæčçèéêëìíîïðñòóôõöøùúûüýÿĀāŁłŃńŅņŒœŚśŠšŜŝŸŽžź]|[ÀÁÂÃÄÅÆČÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæčçèéêëìíîïðñòóôõöøùúûüýÿĀāŁłŃńŅņŒœŚśŠšŜŝŸŽžź][a-zA-Z]{3,}
|
||||
|
||||
# highlighted letters
|
||||
\[[A-Z]\][a-z]+
|
||||
|
||||
# French
|
||||
# This corpus only had capital letters, but you probably want lowercase ones as well.
|
||||
\b[LN]'+[a-z]{2,}\b
|
||||
|
||||
# latex (check-spelling >= 0.0.22)
|
||||
\\\w{2,}\{
|
||||
|
||||
# American Mathematical Society (AMS) / Doxygen
|
||||
TeX/AMS
|
||||
|
||||
# File extensions
|
||||
\*\.[+\w]+,
|
||||
|
||||
# eslint
|
||||
"varsIgnorePattern": ".+"
|
||||
|
||||
# nolint
|
||||
nolint:\s*[\w,]+
|
||||
|
||||
# Windows short paths
|
||||
[/\\][^/\\]{5,6}~\d{1,2}(?=[/\\])
|
||||
|
||||
# Windows Resources with accelerators
|
||||
\b[A-Z]&[a-z]+\b(?!;)
|
||||
|
||||
# signed off by
|
||||
(?i)Signed-off-by: .*
|
||||
|
||||
# cygwin paths
|
||||
/cygdrive/[a-zA-Z]/(?:Program Files(?: \(.*?\)| ?)(?:/[-+.~\\/()\w ]+)*|[-+.~\\/()\w])+
|
||||
|
||||
# in check-spelling@v0.0.22+, printf markers aren't automatically consumed
|
||||
# printf markers
|
||||
(?<!\\)\\[nrt](?=[a-z]{2,})
|
||||
# alternate printf markers if you run into latex and friends
|
||||
(?<!\\)\\[nrt](?=[a-z]{2,})(?=.*['"`])
|
||||
|
||||
# Markdown anchor links
|
||||
\(#\S*?[a-zA-Z]\S*?\)
|
||||
|
||||
# apache
|
||||
a2(?:en|dis)
|
||||
|
||||
# weak e-tag
|
||||
W/"[^"]+"
|
||||
|
||||
# authors/credits
|
||||
^\*(?: [A-Z](?:\w+|\.)){2,} (?=\[|$)
|
||||
|
||||
# the negative lookahead here is to allow catching 'templatesz' as a misspelling
|
||||
# but to otherwise recognize a Windows path with \templates\foo.template or similar:
|
||||
\\(?:necessary|r(?:elease|eport|esolve[dr]?|esult)|t(?:arget|emplates?))(?![a-z])
|
||||
# ignore long runs of a single character:
|
||||
\b([A-Za-z])\g{-1}{3,}\b
|
||||
|
||||
# version suffix <word>v#
|
||||
(?:(?<=[A-Z]{2})V|(?<=[a-z]{2}|[A-Z]{2})v)\d+(?:\b|(?=[a-zA-Z_]))
|
||||
|
||||
# Compiler flags (Unix, Java/Scala)
|
||||
# Use if you have things like `-Pdocker` and want to treat them as `docker`
|
||||
#(?:^|[\t ,>"'`=(#])-(?:(?:J-|)[DPWXY]|[Llf])(?=[A-Z]{2,}|[A-Z][a-z]|[a-z]{2,})
|
||||
|
||||
# Compiler flags (Windows / PowerShell)
|
||||
# This is a subset of the more general compiler flags pattern.
|
||||
# It avoids matching `-Path` to prevent it from being treated as `ath`
|
||||
#(?:^|[\t ,"'`=(#])-(?:[DPL](?=[A-Z]{2,})|[WXYlf](?=[A-Z]{2,}|[A-Z][a-z]|[a-z]{2,}))
|
||||
|
||||
# Compiler flags (linker)
|
||||
,-B
|
||||
|
||||
# libraries
|
||||
(?:\b|_)[Ll]ib(?:re(?=office)|)(?!era[lt]|ero|erty|rar(?:i(?:an|es)|y))(?=[a-z])
|
||||
|
||||
# WWNN/WWPN (NAA identifiers)
|
||||
\b(?:0x)?10[0-9a-f]{14}\b|\b(?:0x|3)?[25][0-9a-f]{15}\b|\b(?:0x|3)?6[0-9a-f]{31}\b
|
||||
|
||||
# iSCSI iqn (approximate regex)
|
||||
\biqn\.[0-9]{4}-[0-9]{2}(?:[\.-][a-z][a-z0-9]*)*\b
|
||||
|
||||
# curl arguments
|
||||
\b(?:\\n|)curl(?:\.exe|)(?:\s+-[a-zA-Z]{1,2}\b)*(?:\s+-[a-zA-Z]{3,})(?:\s+-[a-zA-Z]+)*
|
||||
# set arguments
|
||||
\b(?:bash|sh|set)(?:\s+[-+][abefimouxE]{1,2})*\s+[-+][abefimouxE]{3,}(?:\s+[-+][abefimouxE]+)*
|
||||
# tar arguments
|
||||
\b(?:\\n|)g?tar(?:\.exe|)(?:(?:\s+--[-a-zA-Z]+|\s+-[a-zA-Z]+|\s[ABGJMOPRSUWZacdfh-pr-xz]+\b)(?:=[^ ]*|))+
|
||||
# tput arguments -- https://man7.org/linux/man-pages/man5/terminfo.5.html -- technically they can be more than 5 chars long...
|
||||
\btput\s+(?:(?:-[SV]|-T\s*\w+)\s+)*\w{3,5}\b
|
||||
# macOS temp folders
|
||||
/var/folders/\w\w/[+\w]+/(?:T|-Caches-)/
|
||||
# github runner temp folders
|
||||
/home/runner/work/_temp/[-_/a-z0-9]+
|
||||
93
.github/actions/spelling/excludes.txt
vendored
Normal file
93
.github/actions/spelling/excludes.txt
vendored
Normal file
@@ -0,0 +1,93 @@
|
||||
# See https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples:-excludes
|
||||
(?:^|/)(?i)COPYRIGHT
|
||||
(?:^|/)(?i)LICEN[CS]E
|
||||
(?:^|/)(?i)third[-_]?party/
|
||||
(?:^|/)3rdparty/
|
||||
(?:^|/)generated/
|
||||
(?:^|/)go\.sum$
|
||||
(?:^|/)package(?:-lock|)\.json$
|
||||
(?:^|/)Pipfile$
|
||||
(?:^|/)pyproject.toml
|
||||
(?:^|/)vendor/
|
||||
(?:^|/|\b)requirements(?:-dev|-doc|-test|)\.txt$
|
||||
\.a$
|
||||
\.ai$
|
||||
\.all-contributorsrc$
|
||||
\.avi$
|
||||
\.bmp$
|
||||
\.bz2$
|
||||
\.cert?$|\.crt$
|
||||
\.class$
|
||||
\.coveragerc$
|
||||
\.crl$
|
||||
\.csr$
|
||||
\.dll$
|
||||
\.docx?$
|
||||
\.drawio$
|
||||
\.DS_Store$
|
||||
\.eot$
|
||||
\.eps$
|
||||
\.exe$
|
||||
\.gif$
|
||||
\.git-blame-ignore-revs$
|
||||
\.gitattributes$
|
||||
\.gitkeep$
|
||||
\.graffle$
|
||||
\.gz$
|
||||
\.icns$
|
||||
\.ico$
|
||||
\.ipynb$
|
||||
\.jar$
|
||||
\.jks$
|
||||
\.jpe?g$
|
||||
\.key$
|
||||
\.lib$
|
||||
\.lock$
|
||||
\.map$
|
||||
\.min\..
|
||||
\.mo$
|
||||
\.mod$
|
||||
\.mp[34]$
|
||||
\.o$
|
||||
\.ocf$
|
||||
\.otf$
|
||||
\.p12$
|
||||
\.parquet$
|
||||
\.pdf$
|
||||
\.pem$
|
||||
\.pfx$
|
||||
\.png$
|
||||
\.psd$
|
||||
\.pyc$
|
||||
\.pylintrc$
|
||||
\.qm$
|
||||
\.s$
|
||||
\.sig$
|
||||
\.so$
|
||||
\.svgz?$
|
||||
\.sys$
|
||||
\.tar$
|
||||
\.tgz$
|
||||
\.tiff?$
|
||||
\.ttf$
|
||||
\.wav$
|
||||
\.webm$
|
||||
\.webp$
|
||||
\.woff2?$
|
||||
\.xcf$
|
||||
\.xlsx?$
|
||||
\.xpm$
|
||||
\.xz$
|
||||
\.zip$
|
||||
^\.github/actions/spelling/
|
||||
^\Q.github/FUNDING.yml\E$
|
||||
^\Q.github/workflows/spelling.yml\E$
|
||||
^data/crawlers/
|
||||
^docs/blog/tags\.yml$
|
||||
^docs/manifest/.*$
|
||||
^docs/static/\.nojekyll$
|
||||
^lib/policy/config/testdata/bad/unparseable\.json$
|
||||
ignore$
|
||||
robots.txt
|
||||
^lib/localization/locales/.*\.json$
|
||||
^lib/localization/.*_test.go$
|
||||
333
.github/actions/spelling/expect.txt
vendored
Normal file
333
.github/actions/spelling/expect.txt
vendored
Normal file
@@ -0,0 +1,333 @@
|
||||
acs
|
||||
aeacus
|
||||
Aibrew
|
||||
alrest
|
||||
amazonbot
|
||||
anthro
|
||||
anubis
|
||||
anubistest
|
||||
apk
|
||||
Applebot
|
||||
archlinux
|
||||
asnc
|
||||
asnchecker
|
||||
asns
|
||||
aspirational
|
||||
atuin
|
||||
azuretools
|
||||
badregexes
|
||||
bdba
|
||||
berr
|
||||
bingbot
|
||||
bitcoin
|
||||
blogging
|
||||
Bluesky
|
||||
blueskybot
|
||||
boi
|
||||
botnet
|
||||
BPort
|
||||
Brightbot
|
||||
broked
|
||||
Bytespider
|
||||
cachebuster
|
||||
cachediptoasn
|
||||
Caddyfile
|
||||
caninetools
|
||||
Cardyb
|
||||
celchecker
|
||||
celphase
|
||||
cerr
|
||||
certresolver
|
||||
cespare
|
||||
CGNAT
|
||||
cgr
|
||||
chainguard
|
||||
chall
|
||||
challengemozilla
|
||||
checkpath
|
||||
checkresult
|
||||
chibi
|
||||
cidranger
|
||||
ckie
|
||||
cloudflare
|
||||
Codespaces
|
||||
confd
|
||||
containerbuild
|
||||
coreutils
|
||||
Cotoyogi
|
||||
CRDs
|
||||
Cromite
|
||||
crt
|
||||
Cscript
|
||||
daemonizing
|
||||
DDOS
|
||||
Debian
|
||||
debrpm
|
||||
decaymap
|
||||
Diffbot
|
||||
discordapp
|
||||
discordbot
|
||||
distros
|
||||
dnf
|
||||
dnsbl
|
||||
dnserr
|
||||
domainhere
|
||||
dracula
|
||||
dronebl
|
||||
droneblresponse
|
||||
dropin
|
||||
duckduckbot
|
||||
eerror
|
||||
ellenjoe
|
||||
emacs
|
||||
enbyware
|
||||
etld
|
||||
everyones
|
||||
evilbot
|
||||
evilsite
|
||||
expressionorlist
|
||||
externalagent
|
||||
externalfetcher
|
||||
extldflags
|
||||
facebookgo
|
||||
Factset
|
||||
fastcgi
|
||||
fediverse
|
||||
finfos
|
||||
Firecrawl
|
||||
flagenv
|
||||
Fordola
|
||||
forgejo
|
||||
fsys
|
||||
fullchain
|
||||
gaissmai
|
||||
Galvus
|
||||
geoip
|
||||
geoipchecker
|
||||
gha
|
||||
gipc
|
||||
gitea
|
||||
godotenv
|
||||
goland
|
||||
gomod
|
||||
goodbot
|
||||
googlebot
|
||||
govulncheck
|
||||
goyaml
|
||||
GPG
|
||||
GPT
|
||||
gptbot
|
||||
grpcprom
|
||||
grw
|
||||
Hashcash
|
||||
hashrate
|
||||
headermap
|
||||
healthcheck
|
||||
hebis
|
||||
hec
|
||||
hmc
|
||||
hostable
|
||||
htmlc
|
||||
htmx
|
||||
httpdebug
|
||||
Huawei
|
||||
hypertext
|
||||
iaskspider
|
||||
iat
|
||||
ifm
|
||||
Imagesift
|
||||
imgproxy
|
||||
impressum
|
||||
inp
|
||||
IPTo
|
||||
iptoasn
|
||||
iss
|
||||
isset
|
||||
ivh
|
||||
Jenomis
|
||||
JGit
|
||||
joho
|
||||
journalctl
|
||||
jshelter
|
||||
JWTs
|
||||
kagi
|
||||
kagibot
|
||||
keikaku
|
||||
Keyfunc
|
||||
keypair
|
||||
KHTML
|
||||
kinda
|
||||
KUBECONFIG
|
||||
lcj
|
||||
ldflags
|
||||
letsencrypt
|
||||
Lexentale
|
||||
lgbt
|
||||
licend
|
||||
licstart
|
||||
lightpanda
|
||||
LIMSA
|
||||
Linting
|
||||
linuxbrew
|
||||
LLU
|
||||
loadbalancer
|
||||
lol
|
||||
LOMINSA
|
||||
maintainership
|
||||
malware
|
||||
mcr
|
||||
memes
|
||||
metarefresh
|
||||
metrix
|
||||
mimi
|
||||
minica
|
||||
mistralai
|
||||
Mojeek
|
||||
mojeekbot
|
||||
mozilla
|
||||
nbf
|
||||
netsurf
|
||||
nginx
|
||||
nicksnyder
|
||||
nobots
|
||||
NONINFRINGEMENT
|
||||
nosleep
|
||||
OCOB
|
||||
ogtags
|
||||
omgili
|
||||
omgilibot
|
||||
openai
|
||||
opengraph
|
||||
openrc
|
||||
pag
|
||||
palemoon
|
||||
Pangu
|
||||
parseable
|
||||
passthrough
|
||||
Patreon
|
||||
pgrep
|
||||
phrik
|
||||
pidfile
|
||||
pids
|
||||
pipefail
|
||||
pki
|
||||
podkova
|
||||
podman
|
||||
prebaked
|
||||
privkey
|
||||
promauto
|
||||
promhttp
|
||||
proofofwork
|
||||
publicsuffix
|
||||
pwcmd
|
||||
pwuser
|
||||
qualys
|
||||
qwant
|
||||
qwantbot
|
||||
rac
|
||||
rawler
|
||||
rcvar
|
||||
redhat
|
||||
redir
|
||||
redirectscheme
|
||||
refactors
|
||||
relayd
|
||||
reputational
|
||||
reqmeta
|
||||
risc
|
||||
ruleset
|
||||
runlevels
|
||||
RUnlock
|
||||
runtimedir
|
||||
sas
|
||||
sasl
|
||||
Scumm
|
||||
searchbot
|
||||
searx
|
||||
sebest
|
||||
secretplans
|
||||
selfsigned
|
||||
Semrush
|
||||
Seo
|
||||
setsebool
|
||||
shellcheck
|
||||
Sidetrade
|
||||
simprint
|
||||
sitemap
|
||||
skopeo
|
||||
sls
|
||||
sni
|
||||
Sourceware
|
||||
Spambot
|
||||
sparkline
|
||||
spyderbot
|
||||
srv
|
||||
stackoverflow
|
||||
startprecmd
|
||||
stoppostcmd
|
||||
subgrid
|
||||
subr
|
||||
subrequest
|
||||
SVCNAME
|
||||
tagline
|
||||
tarballs
|
||||
tarrif
|
||||
techaro
|
||||
techarohq
|
||||
templ
|
||||
templruntime
|
||||
testarea
|
||||
Thancred
|
||||
thoth
|
||||
thothmock
|
||||
Tik
|
||||
Timpibot
|
||||
traefik
|
||||
uberspace
|
||||
unifiedjs
|
||||
unixhttpd
|
||||
unmarshal
|
||||
unparseable
|
||||
uuidgen
|
||||
uvx
|
||||
UXP
|
||||
Valkey
|
||||
Varis
|
||||
Velen
|
||||
vendored
|
||||
vhosts
|
||||
videotest
|
||||
waitloop
|
||||
weblate
|
||||
webmaster
|
||||
webpage
|
||||
websecure
|
||||
websites
|
||||
Webzio
|
||||
wildbase
|
||||
withthothmock
|
||||
wordpress
|
||||
Workaround
|
||||
workdir
|
||||
wpbot
|
||||
xcaddy
|
||||
Xeact
|
||||
xeiaso
|
||||
xeserv
|
||||
xesite
|
||||
xess
|
||||
xff
|
||||
XForwarded
|
||||
XNG
|
||||
XOB
|
||||
XReal
|
||||
yae
|
||||
YAMLTo
|
||||
yeet
|
||||
yeetfile
|
||||
yourdomain
|
||||
yoursite
|
||||
Zenos
|
||||
zizmor
|
||||
zombocom
|
||||
zos
|
||||
463
.github/actions/spelling/line_forbidden.patterns
vendored
Normal file
463
.github/actions/spelling/line_forbidden.patterns
vendored
Normal file
@@ -0,0 +1,463 @@
|
||||
# reject `m_data` as VxWorks defined it and that breaks things if it's used elsewhere
|
||||
# see [fprime](https://github.com/nasa/fprime/commit/d589f0a25c59ea9a800d851ea84c2f5df02fb529)
|
||||
# and [Qt](https://github.com/qtproject/qt-solutions/blame/fb7bc42bfcc578ff3fa3b9ca21a41e96eb37c1c7/qtscriptclassic/src/qscriptbuffer_p.h#L46)
|
||||
#\bm_data\b
|
||||
|
||||
# Were you debugging using a framework with `fit()`?
|
||||
# If you have a framework that uses `it()` for testing and `fit()` for debugging a specific test,
|
||||
# you might not want to check in code where you skip all the other tests.
|
||||
#\bfit\(
|
||||
|
||||
# English does not use a hyphen between adverbs and nouns
|
||||
# https://twitter.com/nyttypos/status/1894815686192685239
|
||||
(?:^|\s)[A-Z]?[a-z]+ly-(?=[a-z]{3,})(?:[.,?!]?\s|$)
|
||||
|
||||
# Don't use `requires that` + `to be`
|
||||
# https://twitter.com/nyttypos/status/1894816551435641027
|
||||
\brequires that \w+\b[^.]+to be\b
|
||||
|
||||
# A fully parenthetical sentence’s period goes inside the parentheses, not outside.
|
||||
# https://twitter.com/nyttypos/status/1898844061873639490
|
||||
#\([A-Z][a-z]{2,}(?: [a-z]+){3,}\)\.\s
|
||||
|
||||
# Complete sentences in parentheticals should not have a space before the period.
|
||||
\s\.\)(?!.*\}\})
|
||||
|
||||
# Should be `HH:MM:SS`
|
||||
\bHH:SS:MM\b
|
||||
|
||||
# Should be `86400` (seconds in a standard day)
|
||||
\b84600\b(?:.*\bday\b)
|
||||
|
||||
# Should probably be `2006-01-02` (yyyy-mm-dd)
|
||||
# Assuming that the time is being passed to https://go.dev/src/time/format.go
|
||||
\b2006-02-01\b
|
||||
|
||||
# Should probably be `YYYYMMDD`
|
||||
\b[Yy]{4}[Dd]{2}[Mm]{2}(?!.*[Yy]{4}[Dd]{2}[Mm]{2}).*$
|
||||
|
||||
# Should be `a priori` or `and prior`
|
||||
(?i)(?<!posteriori)\sand priori\s
|
||||
|
||||
# Should be `a`
|
||||
\san (?=(?:[b-df-gj-np-rtv-xz]|h(?!our|tml|ttp)|s(?!sh|vg))[a-z])
|
||||
|
||||
# Should only be one of `a`, `an`, or `the`
|
||||
\b(?:(?:an?|the)\s+){2,}\b
|
||||
|
||||
# Should only be `are` or `can`, not both
|
||||
\b(?:(?:are|can)\s+){2,}\b
|
||||
|
||||
# Should probably be `ABCDEFGHIJKLMNOPQRSTUVWXYZ`
|
||||
(?i)(?!ABCDEFGHIJKLMNOPQRSTUVWXYZ)ABC[A-Z]{21}YZ
|
||||
|
||||
# Should be `anymore`
|
||||
\bany more[,.]
|
||||
|
||||
# Should be `Ask`
|
||||
(?:^|[.?]\s+)As\s+[A-Z][a-z]{2,}\s[^.?]*?(?:how|if|wh\w+)\b
|
||||
|
||||
# Should be `at one fell swoop`
|
||||
# and only when talking about killing, not some other completion
|
||||
# Act 4 Scene 3, Macbeth
|
||||
# https://www.opensourceshakespeare.org/views/plays/play_view.php?WorkID=macbeth&Act=4&Scene=3&Scope=scene
|
||||
\bin one fell s[lw]?oop\b
|
||||
|
||||
# Should be `'`
|
||||
(?i)\b(?:(?:i|s?he|they|what|who|you)[`"]ll|(?:are|ca|did|do|does|ha[ds]|have|is|should|were|wo|would)n[`"]t|(?:s?he|let|that|there|what|where|who)[`"]s|(?:i|they|we|what|who|you)[`"]ve)\b
|
||||
|
||||
# Should be `background` / `intro text` / `introduction` / `prologue` unless it's a brand or relates to _subterfuge_
|
||||
(?i)\bpretext\b
|
||||
|
||||
# Should be `branches`
|
||||
# ... unless it's really about the meal that replaces breakfast and lunch.
|
||||
\b[Bb]runches\b
|
||||
|
||||
# Should be `briefcase`
|
||||
\bbrief-case\b
|
||||
|
||||
# Should be `by far` or `far and away`
|
||||
\bby far and away\b
|
||||
|
||||
# Should be `can, not only ..., ... also...`
|
||||
\bcan not only.*can also\b
|
||||
|
||||
# Should be `cannot` (or `can't`)
|
||||
# See https://www.grammarly.com/blog/cannot-or-can-not/
|
||||
# > Don't use `can not` when you mean `cannot`. The only time you're likely to see `can not` written as separate words is when the word `can` happens to precede some other phrase that happens to start with `not`.
|
||||
# > `Can't` is a contraction of `cannot`, and it's best suited for informal writing.
|
||||
# > In formal writing and where contractions are frowned upon, use `cannot`.
|
||||
# > It is possible to write `can not`, but you generally find it only as part of some other construction, such as `not only . . . but also.`
|
||||
# - if you encounter such a case, add a pattern for that case to patterns.txt.
|
||||
\b[Cc]an not\b(?! only\b)
|
||||
|
||||
# Should be `chart`
|
||||
(?i)\bhelm\b.*\bchard\b
|
||||
|
||||
# Do not use `(click) here` links
|
||||
# For more information, see:
|
||||
# * https://www.w3.org/QA/Tips/noClickHere
|
||||
# * https://webaim.org/techniques/hypertext/link_text
|
||||
# * https://granicus.com/blog/why-click-here-links-are-bad/
|
||||
# * https://heyoka.medium.com/dont-use-click-here-f32f445d1021
|
||||
(?i)(?:>|\[)(?:(?:click |)here|link|(?:read |)more)(?:</|\]\()
|
||||
|
||||
# Including "image of" or "picture of" in alt text is unnecessary.
|
||||
\balt=['"](?:an? |)(?:image|picture) of
|
||||
|
||||
# Alt text should be short
|
||||
\balt=(?:'[^']{126,}'|"[^"]{126,}")
|
||||
|
||||
# Should be `equals` to `is equal to`
|
||||
\bequals to\b
|
||||
|
||||
# Should be `ECMA` 262 (JavaScript)
|
||||
(?i)\bTS\/EMCA\b|\bEMCA(?: \d|\s*Script)|\bEMCA\b(?=.*\bTS\b)
|
||||
|
||||
# Should be `ECMA` 340 (Near Field Communications)
|
||||
(?i)EMCA[- ]340
|
||||
|
||||
# Should be `fall back`
|
||||
\bfallback(?= to)\b
|
||||
|
||||
# Should be `GitHub`
|
||||
(?<![&*.]|// |\b(?:from|import|type) )\bGithub\b(?![{()])
|
||||
|
||||
# Should be `GitLab`
|
||||
(?<![&*.]|// |\b(?:from|import|type) )\bGitlab\b(?![{()])
|
||||
|
||||
# Should probably be `https://`...
|
||||
# Markdown generally doesn't assume that links are to urls
|
||||
\]\(www\.\w
|
||||
|
||||
# Should be `JavaScript`
|
||||
\bJavascript\b
|
||||
|
||||
# Should be `macOS` or `Mac OS X` or ...
|
||||
\bMacOS\b
|
||||
|
||||
# Should be `Microsoft`
|
||||
\bMicroSoft\b
|
||||
|
||||
# Should be `OAuth`
|
||||
(?:^|[^-/*$])[ '"]oAuth(?: [a-z]|\d+ |[^ a-zA-Z0-9:;_.()])
|
||||
|
||||
# Should be `RabbitMQ`
|
||||
\bRabbitmq\b
|
||||
|
||||
# Should be `TensorFlow`
|
||||
\bTensorflow\b
|
||||
|
||||
# Should be `TypeScript`
|
||||
\bTypescript\b
|
||||
|
||||
# Should be `another`
|
||||
\ban[- ]other(?!-)\b
|
||||
|
||||
# Should be `case-(in)sensitive`
|
||||
\bcase (?:in|)sensitive\b
|
||||
|
||||
# Should be `coinciding`
|
||||
\bco-inciding\b
|
||||
|
||||
# Should be `deprecation warning(s)`
|
||||
\b[Dd]epreciation [Ww]arnings?\b
|
||||
|
||||
# Should be `greater than`
|
||||
\bgreater then\b
|
||||
|
||||
# Should be `has`
|
||||
\b[Ii]t only have\b
|
||||
|
||||
# Should be `here-in`, `the`, `them`, `this`, `these` or reworded in some other way
|
||||
\bthe here(?:\.|,| (?!and|defined))
|
||||
|
||||
# Should be `greater than`
|
||||
\bhigher than\b
|
||||
|
||||
# Should be `ID` (unless it's a flag/property)
|
||||
(?<![-\.])\bId\b(?![(])
|
||||
|
||||
# Should be `in front of`
|
||||
\bin from of\b
|
||||
|
||||
# Should be `into`
|
||||
# when not phrasal and when `in order to` would be wrong:
|
||||
# https://thewritepractice.com/into-vs-in-to/
|
||||
\sin to\s(?!if\b)
|
||||
|
||||
# Should be `use`
|
||||
\sin used by\b
|
||||
|
||||
# Should be `in-depth` if used as an adjective (but `in depth` when used as an adverb)
|
||||
\bin depth\s(?!rather\b)\w{6,}
|
||||
|
||||
# Should be `in-flight` or `on the fly` (unless actually talking about airline flights)
|
||||
\bon[- ]flight\b(?!=\s+(?:(?:\w{2}|)\d+|availability|booking|computer|data|delay|departure|management|performance|radar|reservation|scheduling|software|status|ticket|time|type|.*(?:hotel|taxi)))
|
||||
|
||||
# Should be `is obsolete`
|
||||
\bis obsolescent\b
|
||||
|
||||
# Should be `it's` or `its`
|
||||
\bits['’]
|
||||
|
||||
# Should be `its`
|
||||
\bit's(?= own\b)
|
||||
|
||||
# Should be `its`
|
||||
\bit's(?= only purpose\b)
|
||||
|
||||
# Should be `for its` (possessive) or `because it is`
|
||||
\bfor it(?:'s| is)\b
|
||||
|
||||
# Should be `log in`
|
||||
\blogin to the
|
||||
|
||||
# Should be `long-standing`
|
||||
\blong standing\b
|
||||
|
||||
# `apt-key` is deprecated
|
||||
# ... instead you should be writing a pair of files:
|
||||
# ... * the gpg key added to a distinct key ring file based on your project/distro/key...
|
||||
# ... * the sources.list in a district file -- not simply appended to `/etc/apt/sources.list` -- (there is a newer format [DEB822](https://manpages.debian.org/bookworm/dpkg-dev/deb822.5.en.html)) that references the gpg key.
|
||||
# Consider:
|
||||
# ````sh
|
||||
# curl http://download.something.example.com/$DISTRO/Release.key | \
|
||||
# gpg --dearmor --yes --output /usr/share/keyrings/something-distro.gpg
|
||||
# echo "deb [signed-by=/usr/share/keyrings/something-distro.gpg] http://download.something.example.com/repositories/home:/$DISTRO ./" \
|
||||
# >> /etc/apt/sources.list.d/something-distro.list
|
||||
# ````
|
||||
\bapt-key add\b
|
||||
|
||||
# Should be `nearby`
|
||||
\bnear by\b
|
||||
|
||||
# Should probably be a person named `Nick` or the abbreviation `NIC`
|
||||
\bNic\b
|
||||
|
||||
# Should be `not supposed`
|
||||
\bsupposed not\b
|
||||
|
||||
# Should probably be `much more`
|
||||
\bmore much\b
|
||||
|
||||
# Should be `perform its`
|
||||
\bperform it's\b
|
||||
|
||||
# Should be `opt-in`
|
||||
(?<!\scan|for)(?<!\smust)(?<!\sif)\sopt in\s
|
||||
|
||||
# Should be `less than`
|
||||
\bless then\b
|
||||
|
||||
# Should be `load balancer`
|
||||
\b[Ll]oud balancer
|
||||
|
||||
# Should be `moot`
|
||||
\bmute point\b
|
||||
|
||||
# Should be `one of`
|
||||
(?<!-)\bon of\b
|
||||
|
||||
# Should be `on the other hand`
|
||||
\b(?i)on another hand\b
|
||||
|
||||
# Reword to `on at runtime` or `enabled at launch`
|
||||
# The former if you mean it can be changed dynamically.
|
||||
# The latter if you mean that it can be changed without recompiling but not after the program starts.
|
||||
\bswitched on runtime\b
|
||||
|
||||
# Should be `Of course,`
|
||||
[?.!]\s+Of course\s(?=[-\w\s]+[.?;!,])
|
||||
|
||||
# Most people only have two hands. Reword.
|
||||
\b(?i)on the third hand\b
|
||||
|
||||
# Should be `OpenShift`
|
||||
\bOpenshift\b
|
||||
|
||||
# Should be `otherwise`
|
||||
\bother[- ]wise\b
|
||||
|
||||
# Should be `; otherwise` or `. Otherwise`
|
||||
# https://study.com/learn/lesson/otherwise-in-a-sentence.html
|
||||
, [Oo]therwise\b
|
||||
|
||||
# Should probably be `Otherwise,`
|
||||
(?<=\. )Otherwise\s
|
||||
|
||||
# Should be `or (more|less)`
|
||||
\bore (?:more|less)\b
|
||||
|
||||
# Should be `rather than`
|
||||
\brather then\b
|
||||
|
||||
# Should be `Red Hat`
|
||||
\bRed[Hh]at\b
|
||||
|
||||
# Should be `regardless, ...` or `regardless of (whether)`
|
||||
\b[Rr]egardless if you\b
|
||||
|
||||
# Should be `self-signed`
|
||||
\bself signed\b
|
||||
|
||||
# Should be `SendGrid`
|
||||
\bSendgrid\b
|
||||
|
||||
# Should be `set up` (`setup` is a noun / `set up` is a verb)
|
||||
\b[Ss]etup(?= (?:an?|the)\b)
|
||||
|
||||
# Should be `state`
|
||||
\bsate(?=\b|[A-Z])|(?<=[a-z])Sate(?=\b|[A-Z])|(?<=[A-Z]{2})Sate(?=\b|[A-Z])
|
||||
|
||||
# Should be `no longer needed`
|
||||
\bno more needed\b(?! than\b)
|
||||
|
||||
# Should be `<see|look> below for the`
|
||||
(?i)\bfind below the\b
|
||||
|
||||
# Should be `then any` unless there's a comparison before the `,`
|
||||
, than any\b
|
||||
|
||||
# Should be `did not exist`
|
||||
\bwere not existent\b
|
||||
|
||||
# Should be `nonexistent`
|
||||
\bnon existing\b
|
||||
|
||||
# Should be `nonexistent`
|
||||
\b[Nn]o[nt][- ]existent\b
|
||||
|
||||
# Should be `our`
|
||||
\bspending out time\b
|
||||
|
||||
# Should be `@brief` / `@details` / `@param` / `@return` / `@retval`
|
||||
(?:^\s*|(?:\*|//|/*)\s+`)[\\@](?:breif|(?:detail|detials)|(?:params(?!\.)|prama?)|ret(?:uns?)|retvl)\b
|
||||
|
||||
# Should be `more than` or `more, then`
|
||||
\bmore then\b
|
||||
|
||||
# Should be `Pipeline`/`pipeline`
|
||||
(?:(?<=\b|[A-Z])p|P)ipeLine(?:\b|(?=[A-Z]))
|
||||
|
||||
# Should be `preexisting`
|
||||
[Pp]re[- ]existing
|
||||
|
||||
# Should be `preempt`
|
||||
[Pp]re[- ]empt\b
|
||||
|
||||
# Should be `preemptively`
|
||||
[Pp]re[- ]emptively
|
||||
|
||||
# Should be `prepopulate`
|
||||
[Pp]re[- ]populate
|
||||
|
||||
# Should be `prerequisite`
|
||||
[Pp]re[- ]requisite
|
||||
|
||||
# Should be `recently changed` or `recent changes`
|
||||
[Rr]ecent changed
|
||||
|
||||
# Should be `reentrancy`
|
||||
[Rr]e[- ]entrancy
|
||||
|
||||
# Should be `reentrant`
|
||||
[Rr]e[- ]entrant
|
||||
|
||||
# Should be `room for`
|
||||
\brooms for (?!lease|rent|sale)
|
||||
|
||||
# Should be `socioeconomic`
|
||||
# https://dictionary.cambridge.org/us/dictionary/english/socioeconomic
|
||||
socio-economic
|
||||
|
||||
# Should be `strong suit`
|
||||
\b(?:my|his|her|their) strong suite\b
|
||||
|
||||
# Should probably be `temperatures` unless actually talking about thermal drafts (things birds may fly on)
|
||||
\bthermals\b
|
||||
|
||||
# Should be `there are` or `they are` (or `they're`)
|
||||
(?i)\btheir are\b
|
||||
|
||||
# Should be `understand`
|
||||
\bunder stand\b
|
||||
|
||||
# Should be `URI` or `uri` unless it refers to a person named `Uri` (or a flag)
|
||||
(?<![-\.])\bUri\b(?![(])
|
||||
|
||||
# Should be `it uses is`
|
||||
/\bis uses is\b/
|
||||
|
||||
# Should be `uses it as`
|
||||
(?:^|\. |and )uses is as (?!an?\b|follows|livestock|[^.]+\s+as\b)
|
||||
|
||||
# Should be `was`
|
||||
\bhas been(?= removed in v?\d)
|
||||
|
||||
# Should be `where`
|
||||
\bwere they are\b
|
||||
|
||||
# Should be `why`
|
||||
, way(?= is [^.]*\?)
|
||||
|
||||
# should be `vCenter`
|
||||
\bV[Cc]enter\b
|
||||
|
||||
# Should be `VM`
|
||||
\bVm\b
|
||||
|
||||
# Should be `walkthrough(s)`
|
||||
\bwalk-throughs?\b
|
||||
|
||||
# Should be `we'll`
|
||||
\bwe 'll\b
|
||||
|
||||
# Should be `whereas`
|
||||
\bwhere as\b
|
||||
|
||||
# Should be `WinGet`
|
||||
\bWinget\b
|
||||
|
||||
# Should be `without` (unless `out` is a modifier of the next word)
|
||||
\bwith out\b(?!-)
|
||||
|
||||
# Should be `work around`
|
||||
\b[Ww]orkaround(?= an?\b)
|
||||
|
||||
# Should be `workarounds`
|
||||
\bwork[- ]arounds\b
|
||||
|
||||
# Should be `workaround`
|
||||
(?:(?:[Aa]|[Tt]he|ugly)\swork[- ]around\b|\swork[- ]around\s+for)
|
||||
|
||||
# Should be `worst`
|
||||
(?i)worse-case
|
||||
|
||||
# Should be `you are not` or reworded
|
||||
\byour not\b
|
||||
|
||||
# Should be `(coarse|fine)-grained`
|
||||
\b(?:coarse|fine) grained\b
|
||||
|
||||
# Homoglyph (Cyrillic) should be `A`/`B`/`C`/`E`/`H`/`I`/`I`/`J`/`K`/`M`/`O`/`P`/`S`/`T`/`Y`
|
||||
# It's possible that your content is intentionally mixing Cyrillic and Latin scripts, but if it isn't, you definitely want to correct this.
|
||||
(?<=[A-Z]{2})[АВСЕНІӀЈКМОРЅТУ]|[АВСЕНІӀЈКМОРЅТУ](?=[A-Z]+(?:\b|[a-z]+)|[a-z]+(?:[^a-z]|$))
|
||||
|
||||
# Homoglyph (Cyrillic) should be `a`/`b`/`c`/`e`/`o`/`p`/`x`/`y`
|
||||
# It's possible that your content is intentionally mixing Cyrillic and Latin scripts, but if it isn't, you definitely want to correct this.
|
||||
[авсеорху](?=[A-Za-z]{2,})|(?<=[A-Za-z]{2})[авсеорху]|(?<=[A-Za-z])[авсеорху](?=[A-Za-z])
|
||||
|
||||
# Should be `neither/nor` -- or reword
|
||||
(?!<do )\bnot\b([^.?!"/(](?!neither|,.*?,))+\bnor\b
|
||||
|
||||
# Should be `neither/nor` (plus rewording the beginning)
|
||||
# This is probably a double negative...
|
||||
\bnot\b[^.?!"/(]*\bneither\b[^.?!"/(]*\bnor\b
|
||||
|
||||
# In English, duplicated words are generally mistakes
|
||||
# There are a few exceptions (e.g. "that that").
|
||||
# If the highlighted doubled word pair is in:
|
||||
# * code, write a pattern to mask it.
|
||||
# * prose, have someone read the English before you dismiss this error.
|
||||
\s([A-Z]{3,}|[A-Z][a-z]{2,}|[a-z]{3,})\s\g{-1}\s
|
||||
134
.github/actions/spelling/patterns.txt
vendored
Normal file
134
.github/actions/spelling/patterns.txt
vendored
Normal file
@@ -0,0 +1,134 @@
|
||||
# See https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples:-patterns
|
||||
|
||||
# Automatically suggested patterns
|
||||
|
||||
# hit-count: 198 file-count: 52
|
||||
# https/http/file urls
|
||||
(?:\b(?:https?|ftp|file)://)[-A-Za-z0-9+&@#/*%?=~_|!:,.;]+[-A-Za-z0-9+&@#/*%=~_|]
|
||||
|
||||
# hit-count: 22 file-count: 8
|
||||
# GitHub actions
|
||||
\buses:\s+[-\w.]+/[-\w./]+@[-\w.]+
|
||||
|
||||
# hit-count: 19 file-count: 5
|
||||
# libraries
|
||||
(?:\b|_)[Ll]ib(?:re(?=office)|era(?![lt])|)(?!ero|erty|rar(?:i(?:an|es)|y))(?=[a-z])
|
||||
|
||||
# hit-count: 17 file-count: 8
|
||||
# version suffix <word>v#
|
||||
(?:(?<=[A-Z]{2})V|(?<=[a-z]{2}|[A-Z]{2})v)\d+(?:\b|(?=[a-zA-Z_]))
|
||||
|
||||
# hit-count: 15 file-count: 7
|
||||
# container images
|
||||
image: [-\w./:@]+
|
||||
|
||||
# hit-count: 14 file-count: 9
|
||||
# imports
|
||||
^import\s+(?:(?:static|type)\s+|)(?:[\w.]|\{\s*\w*?(?:,\s*(?:\w*|\*))+\s*\})+
|
||||
|
||||
# hit-count: 11 file-count: 2
|
||||
# hex digits including css/html color classes:
|
||||
(?:[\\0][xX]|\\u|[uU]\+|#x?|%23|&H)[0-9_a-fA-FgGrR]*?[a-fA-FgGrR]{2,}[0-9_a-fA-FgGrR]*(?:[uUlL]{0,3}|[iu]\d+)\b
|
||||
|
||||
# hit-count: 8 file-count: 5
|
||||
# node packages
|
||||
(["'])@[^/'" ]+/[^/'" ]+\g{-1}
|
||||
|
||||
# hit-count: 5 file-count: 2
|
||||
# css fonts
|
||||
\bfont(?:-family|):[^;}]+
|
||||
|
||||
# hit-count: 4 file-count: 4
|
||||
# set arguments
|
||||
\b(?:bash|sh|set)(?:\s+[-+][abefimouxE]{1,2})*\s+[-+][abefimouxE]{3,}(?:\s+[-+][abefimouxE]+)*
|
||||
|
||||
# hit-count: 4 file-count: 2
|
||||
# css url wrappings
|
||||
\burl\([^)]+\)
|
||||
|
||||
# hit-count: 2 file-count: 2
|
||||
# C network byte conversions
|
||||
(?:\d|\bh)to(?!ken)(?=[a-z])|to(?=[adhiklpun]\()
|
||||
|
||||
# hit-count: 2 file-count: 1
|
||||
# GitHub SHA refs
|
||||
\[([0-9a-f]+)\]\(https://(?:www\.|)github.com/[-\w]+/[-\w]+/commit/\g{-1}[0-9a-f]*
|
||||
|
||||
# hit-count: 1 file-count: 1
|
||||
# copyright
|
||||
Copyright (?:\([Cc]\)|)(?:[-\d, ]|and)+(?: [A-Z][a-z]+ [A-Z][a-z]+,?)+
|
||||
|
||||
# hit-count: 1 file-count: 1
|
||||
# IPv6
|
||||
\b(?:[0-9a-fA-F]{0,4}:){3,7}[0-9a-fA-F]{0,4}\b
|
||||
|
||||
# hit-count: 1 file-count: 1
|
||||
# Docker images
|
||||
^\s*(?i)FROM\s+\S+:\S+(?:\s+AS\s+\S+|)
|
||||
|
||||
# hit-count: 1 file-count: 1
|
||||
# perl run
|
||||
perl(?:\s+-[a-zA-Z]\w*)+
|
||||
|
||||
# hit-count: 1 file-count: 1
|
||||
# go install
|
||||
go install(?:\s+[a-z]+\.[-@\w/.]+)+
|
||||
|
||||
# hit-count: 1 file-count: 1
|
||||
# in check-spelling@v0.0.22+, printf markers aren't automatically consumed
|
||||
# printf markers
|
||||
(?<!\\)\\[nrt](?=[a-z]{2,})
|
||||
|
||||
# hit-count: 1 file-count: 1
|
||||
# tar arguments
|
||||
\b(?:\\n|)g?tar(?:\.exe|)(?:(?:\s+--[-a-zA-Z]+|\s+-[a-zA-Z]+|\s[ABGJMOPRSUWZacdfh-pr-xz]+\b)(?:=[^ ]*|))+
|
||||
|
||||
# Questionably acceptable forms of `in to`
|
||||
# Personally, I prefer `log into`, but people object
|
||||
# https://www.tprteaching.com/log-into-log-in-to-login/
|
||||
\b(?:(?:[Ll]og(?:g(?=[a-z])|)|[Ss]ign)(?:ed|ing)?) in to\b
|
||||
|
||||
# to opt in
|
||||
\bto opt in\b
|
||||
|
||||
# pass(ed|ing) in
|
||||
\bpass(?:ed|ing) in\b
|
||||
|
||||
# acceptable duplicates
|
||||
# ls directory listings
|
||||
[-bcdlpsw](?:[-r][-w][-SsTtx]){3}[\.+*]?\s+\d+\s+\S+\s+\S+\s+[.\d]+(?:[KMGT]|)\s+
|
||||
# mount
|
||||
\bmount\s+-t\s+(\w+)\s+\g{-1}\b
|
||||
# C types and repeated CSS values
|
||||
\s(auto|buffalo|center|div|inherit|long|LONG|none|normal|solid|thin|transparent|very)(?: \g{-1})+\s
|
||||
# C enum and struct
|
||||
\b(?:enum|struct)\s+(\w+)\s+\g{-1}\b
|
||||
# go templates
|
||||
\s(\w+)\s+\g{-1}\s+\`(?:graphql|inject|json|yaml):
|
||||
# doxygen / javadoc / .net
|
||||
(?:[\\@](?:brief|defgroup|groupname|link|t?param|return|retval)|(?:public|private|\[Parameter(?:\(.+\)|)\])(?:\s+(?:static|override|readonly|required|virtual))*)(?:\s+\{\w+\}|)\s+(\w+)\s+\g{-1}\s
|
||||
|
||||
# macOS file path
|
||||
(?:Contents\W+|(?!iOS)/)MacOS\b
|
||||
|
||||
# Python package registry has incorrect spelling for macOS / Mac OS X
|
||||
"Operating System :: MacOS :: MacOS X"
|
||||
|
||||
# "company" in Germany
|
||||
\bGmbH\b
|
||||
|
||||
# IntelliJ
|
||||
\bIntelliJ\b
|
||||
|
||||
# Commit message -- Signed-off-by and friends
|
||||
^\s*(?:(?:Based-on-patch|Co-authored|Helped|Mentored|Reported|Reviewed|Signed-off)-by|Thanks-to): (?:[^<]*<[^>]*>|[^<]*)\s*$
|
||||
|
||||
# Autogenerated revert commit message
|
||||
^This reverts commit [0-9a-f]{40}\.$
|
||||
|
||||
# ignore long runs of a single character:
|
||||
\b([A-Za-z])\g{-1}{3,}\b
|
||||
|
||||
# hit-count: 1 file-count: 1
|
||||
# microsoft
|
||||
\b(?:https?://|)(?:(?:(?:blogs|download\.visualstudio|docs|msdn2?|research)\.|)microsoft|blogs\.msdn)\.co(?:m|\.\w\w)/[-_a-zA-Z0-9()=./%]*
|
||||
23
.github/actions/spelling/reject.txt
vendored
Normal file
23
.github/actions/spelling/reject.txt
vendored
Normal file
@@ -0,0 +1,23 @@
|
||||
^attache$
|
||||
^bellows?$
|
||||
benefitting
|
||||
occurences?
|
||||
^dependan.*
|
||||
^develope$
|
||||
^developement$
|
||||
^developpe
|
||||
^Devers?$
|
||||
^devex
|
||||
^devide
|
||||
^Devinn?[ae]
|
||||
^devisal
|
||||
^devisor
|
||||
^diables?$
|
||||
^oer$
|
||||
Sorce
|
||||
^[Ss]pae.*
|
||||
^Teh$
|
||||
^untill$
|
||||
^untilling$
|
||||
^venders?$
|
||||
^wether.*
|
||||
47
.github/workflows/devcontainer.yml
vendored
Normal file
47
.github/workflows/devcontainer.yml
vendored
Normal file
@@ -0,0 +1,47 @@
|
||||
name: Dev container prebuild
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: ["main"]
|
||||
tags: ["v*.*.*"]
|
||||
|
||||
jobs:
|
||||
devcontainer:
|
||||
runs-on: ubuntu-24.04
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
|
||||
with:
|
||||
fetch-tags: true
|
||||
fetch-depth: 0
|
||||
persist-credentials: false
|
||||
|
||||
- name: Set up QEMU
|
||||
uses: docker/setup-qemu-action@29109295f81e9208d7d86ff1c6c12d2833863392 # v3.6.0
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1
|
||||
|
||||
- uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0
|
||||
with:
|
||||
node-version: latest
|
||||
|
||||
- run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get -y install skopeo
|
||||
|
||||
- name: Log into registry
|
||||
if: github.event_name != 'pull_request'
|
||||
uses: docker/login-action@343f7c4344506bcbf9b4de18042ae17996df046d # v3.0.0
|
||||
with:
|
||||
registry: ghcr.io
|
||||
username: techarohq
|
||||
password: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
- name: Pre-build dev container image
|
||||
uses: devcontainers/ci@8bf61b26e9c3a98f69cb6ce2f88d24ff59b785c6 # v0.3.1900000417
|
||||
with:
|
||||
imageName: ghcr.io/techarohq/anubis/devcontainer
|
||||
cacheFrom: ghcr.io/techarohq/anubis/devcontainer
|
||||
push: always
|
||||
platform: linux/amd64,linux/arm64
|
||||
6
.github/workflows/docker-pr.yml
vendored
6
.github/workflows/docker-pr.yml
vendored
@@ -22,7 +22,7 @@ jobs:
|
||||
persist-credentials: false
|
||||
|
||||
- name: Set up Homebrew
|
||||
uses: Homebrew/actions/setup-homebrew@master
|
||||
uses: Homebrew/actions/setup-homebrew@main
|
||||
|
||||
- name: Setup Homebrew cellar cache
|
||||
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
|
||||
@@ -49,7 +49,7 @@ jobs:
|
||||
id: meta
|
||||
uses: docker/metadata-action@902fa8ec7d6ecbf8d84d538b9b233a880e428804 # v5.7.0
|
||||
with:
|
||||
images: ghcr.io/techarohq/anubis
|
||||
images: ghcr.io/${{ github.repository }}
|
||||
|
||||
- name: Build and push
|
||||
id: build
|
||||
@@ -58,7 +58,7 @@ jobs:
|
||||
npm run container
|
||||
env:
|
||||
PULL_REQUEST_ID: ${{ github.event.number }}
|
||||
DOCKER_REPO: ghcr.io/techarohq/anubis
|
||||
DOCKER_REPO: ghcr.io/${{ github.repository }}
|
||||
SLOG_LEVEL: debug
|
||||
|
||||
- run: |
|
||||
|
||||
24
.github/workflows/docker.yml
vendored
24
.github/workflows/docker.yml
vendored
@@ -3,8 +3,8 @@ name: Docker image builds
|
||||
on:
|
||||
workflow_dispatch:
|
||||
push:
|
||||
branches: [ "main" ]
|
||||
tags: [ "v*" ]
|
||||
branches: ["main"]
|
||||
tags: ["v*"]
|
||||
|
||||
env:
|
||||
DOCKER_METADATA_SET_OUTPUT_ENV: "true"
|
||||
@@ -27,8 +27,12 @@ jobs:
|
||||
fetch-depth: 0
|
||||
persist-credentials: false
|
||||
|
||||
- name: Set lowercase image name
|
||||
run: |
|
||||
echo "IMAGE=ghcr.io/${GITHUB_REPOSITORY,,}" >> $GITHUB_ENV
|
||||
|
||||
- name: Set up Homebrew
|
||||
uses: Homebrew/actions/setup-homebrew@master
|
||||
uses: Homebrew/actions/setup-homebrew@main
|
||||
|
||||
- name: Setup Homebrew cellar cache
|
||||
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
|
||||
@@ -51,18 +55,18 @@ jobs:
|
||||
run: |
|
||||
brew bundle
|
||||
|
||||
- name: Log into registry
|
||||
- name: Log into registry
|
||||
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
|
||||
with:
|
||||
registry: ghcr.io
|
||||
username: techarohq
|
||||
username: ${{ github.repository_owner }}
|
||||
password: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
- name: Docker meta
|
||||
id: meta
|
||||
uses: docker/metadata-action@902fa8ec7d6ecbf8d84d538b9b233a880e428804 # v5.7.0
|
||||
with:
|
||||
images: ghcr.io/techarohq/anubis
|
||||
images: ${{ env.IMAGE }}
|
||||
|
||||
- name: Build and push
|
||||
id: build
|
||||
@@ -70,12 +74,12 @@ jobs:
|
||||
npm ci
|
||||
npm run container
|
||||
env:
|
||||
DOCKER_REPO: ghcr.io/techarohq/anubis
|
||||
DOCKER_REPO: ${{ env.IMAGE }}
|
||||
SLOG_LEVEL: debug
|
||||
|
||||
|
||||
- name: Generate artifact attestation
|
||||
uses: actions/attest-build-provenance@db473fddc028af60658334401dc6fa3ffd8669fd # v2.3.0
|
||||
uses: actions/attest-build-provenance@e8998f949152b193b063cb0ec769d69d929409be # v2.4.0
|
||||
with:
|
||||
subject-name: ghcr.io/techarohq/anubis
|
||||
subject-name: ${{ env.IMAGE }}
|
||||
subject-digest: ${{ steps.build.outputs.digest }}
|
||||
push-to-registry: true
|
||||
|
||||
17
.github/workflows/docs-deploy.yml
vendored
17
.github/workflows/docs-deploy.yml
vendored
@@ -3,7 +3,7 @@ name: Docs deploy
|
||||
on:
|
||||
workflow_dispatch:
|
||||
push:
|
||||
branches: [ "main" ]
|
||||
branches: ["main"]
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
@@ -13,6 +13,7 @@ permissions:
|
||||
|
||||
jobs:
|
||||
build:
|
||||
if: github.repository == 'TecharoHQ/anubis'
|
||||
runs-on: ubuntu-24.04
|
||||
|
||||
steps:
|
||||
@@ -21,9 +22,9 @@ jobs:
|
||||
persist-credentials: false
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0
|
||||
uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1
|
||||
|
||||
- name: Log into registry
|
||||
- name: Log into registry
|
||||
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
|
||||
with:
|
||||
registry: ghcr.io
|
||||
@@ -38,7 +39,7 @@ jobs:
|
||||
|
||||
- name: Build and push
|
||||
id: build
|
||||
uses: docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1 # v6.16.0
|
||||
uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
|
||||
with:
|
||||
context: ./docs
|
||||
cache-to: type=gha
|
||||
@@ -49,15 +50,15 @@ jobs:
|
||||
push: true
|
||||
|
||||
- name: Apply k8s manifests to aeacus
|
||||
uses: actions-hub/kubectl@e81783053d902f50d752d21a6d99cf9689a652e1 # v1.33.0
|
||||
uses: actions-hub/kubectl@d50394b7d704525f93faefce1e65a6329ff67271 # v1.33.2
|
||||
env:
|
||||
KUBE_CONFIG: ${{ secrets.AEACUS_KUBECONFIG }}
|
||||
KUBE_CONFIG: ${{ secrets.LIMSA_LOMINSA_KUBECONFIG }}
|
||||
with:
|
||||
args: apply -k docs/manifest
|
||||
|
||||
- name: Apply k8s manifests to aeacus
|
||||
uses: actions-hub/kubectl@e81783053d902f50d752d21a6d99cf9689a652e1 # v1.33.0
|
||||
uses: actions-hub/kubectl@d50394b7d704525f93faefce1e65a6329ff67271 # v1.33.2
|
||||
env:
|
||||
KUBE_CONFIG: ${{ secrets.AEACUS_KUBECONFIG }}
|
||||
KUBE_CONFIG: ${{ secrets.LIMSA_LOMINSA_KUBECONFIG }}
|
||||
with:
|
||||
args: rollout restart -n default deploy/anubis-docs
|
||||
|
||||
6
.github/workflows/docs-test.yml
vendored
6
.github/workflows/docs-test.yml
vendored
@@ -18,17 +18,17 @@ jobs:
|
||||
persist-credentials: false
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0
|
||||
uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1
|
||||
|
||||
- name: Docker meta
|
||||
id: meta
|
||||
uses: docker/metadata-action@902fa8ec7d6ecbf8d84d538b9b233a880e428804 # v5.7.0
|
||||
with:
|
||||
images: ghcr.io/techarohq/anubis/docs
|
||||
images: ghcr.io/${{ github.repository }}/docs
|
||||
|
||||
- name: Build and push
|
||||
id: build
|
||||
uses: docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1 # v6.16.0
|
||||
uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
|
||||
with:
|
||||
context: ./docs
|
||||
cache-to: type=gha
|
||||
|
||||
2
.github/workflows/go.yml
vendored
2
.github/workflows/go.yml
vendored
@@ -25,7 +25,7 @@ jobs:
|
||||
sudo apt-get install -y build-essential
|
||||
|
||||
- name: Set up Homebrew
|
||||
uses: Homebrew/actions/setup-homebrew@master
|
||||
uses: Homebrew/actions/setup-homebrew@main
|
||||
|
||||
- name: Setup Homebrew cellar cache
|
||||
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
|
||||
|
||||
7
.github/workflows/package-builds-stable.yml
vendored
7
.github/workflows/package-builds-stable.yml
vendored
@@ -25,7 +25,7 @@ jobs:
|
||||
sudo apt-get install -y build-essential
|
||||
|
||||
- name: Set up Homebrew
|
||||
uses: Homebrew/actions/setup-homebrew@master
|
||||
uses: Homebrew/actions/setup-homebrew@main
|
||||
|
||||
- name: Setup Homebrew cellar cache
|
||||
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
|
||||
@@ -64,10 +64,7 @@ jobs:
|
||||
|
||||
- name: Build Packages
|
||||
run: |
|
||||
wget https://github.com/TecharoHQ/yeet/releases/download/v0.2.2/yeet_0.2.2_amd64.deb -O var/yeet.deb
|
||||
sudo apt -y install -f ./var/yeet.deb
|
||||
rm ./var/yeet.deb
|
||||
yeet
|
||||
go tool yeet
|
||||
|
||||
- name: Upload released artifacts
|
||||
env:
|
||||
|
||||
@@ -27,7 +27,7 @@ jobs:
|
||||
sudo apt-get install -y build-essential
|
||||
|
||||
- name: Set up Homebrew
|
||||
uses: Homebrew/actions/setup-homebrew@master
|
||||
uses: Homebrew/actions/setup-homebrew@main
|
||||
|
||||
- name: Setup Homebrew cellar cache
|
||||
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
|
||||
@@ -66,10 +66,7 @@ jobs:
|
||||
|
||||
- name: Build Packages
|
||||
run: |
|
||||
wget https://github.com/TecharoHQ/yeet/releases/download/v0.2.2/yeet_0.2.2_amd64.deb -O var/yeet.deb
|
||||
sudo apt -y install -f ./var/yeet.deb
|
||||
rm ./var/yeet.deb
|
||||
yeet
|
||||
go tool yeet
|
||||
|
||||
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
|
||||
with:
|
||||
|
||||
118
.github/workflows/spelling.yml
vendored
Normal file
118
.github/workflows/spelling.yml
vendored
Normal file
@@ -0,0 +1,118 @@
|
||||
name: Check Spelling
|
||||
|
||||
# Comment management is handled through a secondary job, for details see:
|
||||
# https://github.com/check-spelling/check-spelling/wiki/Feature%3A-Restricted-Permissions
|
||||
#
|
||||
# `jobs.comment-push` runs when a push is made to a repository and the `jobs.spelling` job needs to make a comment
|
||||
# (in odd cases, it might actually run just to collapse a comment, but that's fairly rare)
|
||||
# it needs `contents: write` in order to add a comment.
|
||||
#
|
||||
# `jobs.comment-pr` runs when a pull_request is made to a repository and the `jobs.spelling` job needs to make a comment
|
||||
# or collapse a comment (in the case where it had previously made a comment and now no longer needs to show a comment)
|
||||
# it needs `pull-requests: write` in order to manipulate those comments.
|
||||
|
||||
# Updating pull request branches is managed via comment handling.
|
||||
# For details, see: https://github.com/check-spelling/check-spelling/wiki/Feature:-Update-expect-list
|
||||
#
|
||||
# These elements work together to make it happen:
|
||||
#
|
||||
# `on.issue_comment`
|
||||
# This event listens to comments by users asking to update the metadata.
|
||||
#
|
||||
# `jobs.update`
|
||||
# This job runs in response to an issue_comment and will push a new commit
|
||||
# to update the spelling metadata.
|
||||
#
|
||||
# `with.experimental_apply_changes_via_bot`
|
||||
# Tells the action to support and generate messages that enable it
|
||||
# to make a commit to update the spelling metadata.
|
||||
#
|
||||
# `with.ssh_key`
|
||||
# In order to trigger workflows when the commit is made, you can provide a
|
||||
# secret (typically, a write-enabled github deploy key).
|
||||
#
|
||||
# For background, see: https://github.com/check-spelling/check-spelling/wiki/Feature:-Update-with-deploy-key
|
||||
|
||||
# SARIF reporting
|
||||
#
|
||||
# Access to SARIF reports is generally restricted (by GitHub) to members of the repository.
|
||||
#
|
||||
# Requires enabling `security-events: write`
|
||||
# and configuring the action with `use_sarif: 1`
|
||||
#
|
||||
# For information on the feature, see: https://github.com/check-spelling/check-spelling/wiki/Feature:-SARIF-output
|
||||
|
||||
# Minimal workflow structure:
|
||||
#
|
||||
# on:
|
||||
# push:
|
||||
# ...
|
||||
# pull_request_target:
|
||||
# ...
|
||||
# jobs:
|
||||
# # you only want the spelling job, all others should be omitted
|
||||
# spelling:
|
||||
# # remove `security-events: write` and `use_sarif: 1`
|
||||
# # remove `experimental_apply_changes_via_bot: 1`
|
||||
# ... otherwise adjust the `with:` as you wish
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- '**'
|
||||
tags-ignore:
|
||||
- '**'
|
||||
pull_request:
|
||||
branches:
|
||||
- '**'
|
||||
types:
|
||||
- 'opened'
|
||||
- 'reopened'
|
||||
- 'synchronize'
|
||||
|
||||
jobs:
|
||||
spelling:
|
||||
name: Check Spelling
|
||||
permissions:
|
||||
contents: read
|
||||
pull-requests: read
|
||||
actions: read
|
||||
security-events: write
|
||||
outputs:
|
||||
followup: ${{ steps.spelling.outputs.followup }}
|
||||
runs-on: ubuntu-latest
|
||||
if: ${{ contains(github.event_name, 'pull_request') || github.event_name == 'push' }}
|
||||
concurrency:
|
||||
group: spelling-${{ github.event.pull_request.number || github.ref }}
|
||||
# note: If you use only_check_changed_files, you do not want cancel-in-progress
|
||||
cancel-in-progress: true
|
||||
steps:
|
||||
- name: check-spelling
|
||||
id: spelling
|
||||
uses: check-spelling/check-spelling@c635c2f3f714eec2fcf27b643a1919b9a811ef2e # v0.0.25
|
||||
with:
|
||||
suppress_push_for_open_pull_request: ${{ github.actor != 'dependabot[bot]' && 1 }}
|
||||
checkout: true
|
||||
check_file_names: 1
|
||||
post_comment: 0
|
||||
use_magic_file: 1
|
||||
warnings: bad-regex,binary-file,deprecated-feature,ignored-expect-variant,large-file,limited-references,no-newline-at-eof,noisy-file,non-alpha-in-dictionary,token-is-substring,unexpected-line-ending,whitespace-in-dictionary,minified-file,unsupported-configuration,no-files-to-check,unclosed-block-ignore-begin,unclosed-block-ignore-end
|
||||
use_sarif: ${{ (!github.event.pull_request || (github.event.pull_request.head.repo.full_name == github.repository)) && 1 }}
|
||||
check_extra_dictionaries: ""
|
||||
dictionary_source_prefixes: >
|
||||
{
|
||||
"cspell": "https://raw.githubusercontent.com/check-spelling/cspell-dicts/v20241114/dictionaries/"
|
||||
}
|
||||
extra_dictionaries: |
|
||||
cspell:software-terms/softwareTerms.txt
|
||||
cspell:golang/go.txt
|
||||
cspell:npm/npm.txt
|
||||
cspell:k8s/k8s.txt
|
||||
cspell:python/python/python-lib.txt
|
||||
cspell:aws/aws.txt
|
||||
cspell:node/node.txt
|
||||
cspell:html/html.txt
|
||||
cspell:filetypes/filetypes.txt
|
||||
cspell:python/common/extra.txt
|
||||
cspell:docker/docker-words.txt
|
||||
cspell:fullstack/fullstack.txt
|
||||
37
.github/workflows/ssh-ci-runner-cron.yml
vendored
Normal file
37
.github/workflows/ssh-ci-runner-cron.yml
vendored
Normal file
@@ -0,0 +1,37 @@
|
||||
name: Regenerate ssh ci runner image
|
||||
|
||||
on:
|
||||
# pull_request:
|
||||
# branches: ["main"]
|
||||
schedule:
|
||||
- cron: "0 0 1,8,15,22 * *"
|
||||
workflow_dispatch:
|
||||
|
||||
permissions:
|
||||
pull-requests: write
|
||||
contents: write
|
||||
packages: write
|
||||
|
||||
jobs:
|
||||
ssh-ci-rebuild:
|
||||
if: github.repository == 'TecharoHQ/anubis'
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
|
||||
with:
|
||||
fetch-tags: true
|
||||
fetch-depth: 0
|
||||
persist-credentials: false
|
||||
- name: Log into registry
|
||||
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
|
||||
with:
|
||||
registry: ghcr.io
|
||||
username: ${{ github.repository_owner }}
|
||||
password: ${{ secrets.GITHUB_TOKEN }}
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1
|
||||
- name: Build and push
|
||||
run: |
|
||||
cd ./test/ssh-ci
|
||||
docker buildx bake --push
|
||||
37
.github/workflows/ssh-ci.yml
vendored
Normal file
37
.github/workflows/ssh-ci.yml
vendored
Normal file
@@ -0,0 +1,37 @@
|
||||
name: SSH CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: ["main"]
|
||||
# pull_request:
|
||||
# branches: ["main"]
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
ssh:
|
||||
if: github.repository == 'TecharoHQ/anubis'
|
||||
runs-on: ubuntu-24.04
|
||||
strategy:
|
||||
matrix:
|
||||
host:
|
||||
- ubuntu@riscv64.techaro.lol
|
||||
- ci@ppc64le.techaro.lol
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
|
||||
with:
|
||||
fetch-tags: true
|
||||
fetch-depth: 0
|
||||
persist-credentials: false
|
||||
- name: Install CI target SSH key
|
||||
uses: shimataro/ssh-key-action@d4fffb50872869abe2d9a9098a6d9c5aa7d16be4 # v2.7.0
|
||||
with:
|
||||
key: ${{ secrets.CI_SSH_KEY }}
|
||||
name: id_rsa
|
||||
known_hosts: ${{ secrets.CI_SSH_KNOWN_HOSTS }}
|
||||
- name: Run CI
|
||||
run: bash test/ssh-ci/rigging.sh ${{ matrix.host }}
|
||||
env:
|
||||
GITHUB_RUN_ID: ${{ github.run_id }}
|
||||
4
.github/workflows/zizmor.yml
vendored
4
.github/workflows/zizmor.yml
vendored
@@ -21,7 +21,7 @@ jobs:
|
||||
persist-credentials: false
|
||||
|
||||
- name: Install the latest version of uv
|
||||
uses: astral-sh/setup-uv@6b9c6063abd6010835644d4c2e1bef4cf5cd0fca # v6.0.1
|
||||
uses: astral-sh/setup-uv@bd01e18f51369d5a26f1651c3cb451d3417e3bba # v6.3.1
|
||||
|
||||
- name: Run zizmor 🌈
|
||||
run: uvx zizmor --format sarif . > results.sarif
|
||||
@@ -29,7 +29,7 @@ jobs:
|
||||
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
- name: Upload SARIF file
|
||||
uses: github/codeql-action/upload-sarif@60168efe1c415ce0f5521ea06d5c2062adbeed1b # v3.28.17
|
||||
uses: github/codeql-action/upload-sarif@39edc492dbe16b1465b0cafca41432d857bdb31a # v3.29.1
|
||||
with:
|
||||
sarif_file: results.sarif
|
||||
category: zizmor
|
||||
|
||||
2
.gitignore
vendored
2
.gitignore
vendored
@@ -20,3 +20,5 @@ node_modules
|
||||
|
||||
# how does this get here
|
||||
doc/VERSION
|
||||
|
||||
web/static/locales/*.json
|
||||
10
.vscode/extensions.json
vendored
Normal file
10
.vscode/extensions.json
vendored
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"recommendations": [
|
||||
"esbenp.prettier-vscode",
|
||||
"ms-azuretools.vscode-containers",
|
||||
"golang.go",
|
||||
"unifiedjs.vscode-mdx",
|
||||
"a-h.templ",
|
||||
"redhat.vscode-yaml"
|
||||
]
|
||||
}
|
||||
27
.vscode/launch.json
vendored
Normal file
27
.vscode/launch.json
vendored
Normal file
@@ -0,0 +1,27 @@
|
||||
{
|
||||
// Use IntelliSense to learn about possible attributes.
|
||||
// Hover to view descriptions of existing attributes.
|
||||
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
|
||||
"version": "0.2.0",
|
||||
"configurations": [
|
||||
{
|
||||
"name": "Launch Package",
|
||||
"type": "go",
|
||||
"request": "launch",
|
||||
"mode": "auto",
|
||||
"program": "${fileDirname}"
|
||||
},
|
||||
{
|
||||
"name": "Anubis [dev]",
|
||||
"command": "npm run dev",
|
||||
"request": "launch",
|
||||
"type": "node-terminal"
|
||||
},
|
||||
{
|
||||
"name": "Start Docs",
|
||||
"command": "cd docs && npm ci && npm run start",
|
||||
"request": "launch",
|
||||
"type": "node-terminal"
|
||||
}
|
||||
]
|
||||
}
|
||||
19
.vscode/settings.json
vendored
19
.vscode/settings.json
vendored
@@ -11,5 +11,24 @@
|
||||
"zig": false,
|
||||
"javascript": false,
|
||||
"properties": false
|
||||
},
|
||||
"[markdown]": {
|
||||
"editor.wordWrap": "wordWrapColumn",
|
||||
"editor.wordWrapColumn": 80,
|
||||
"editor.wordBasedSuggestions": "off"
|
||||
},
|
||||
"[mdx]": {
|
||||
"editor.wordWrap": "wordWrapColumn",
|
||||
"editor.wordWrapColumn": 80,
|
||||
"editor.wordBasedSuggestions": "off"
|
||||
},
|
||||
"[nunjucks]": {
|
||||
"editor.wordWrap": "wordWrapColumn",
|
||||
"editor.wordWrapColumn": 80,
|
||||
"editor.wordBasedSuggestions": "off"
|
||||
},
|
||||
"cSpell.enabledFileTypes": {
|
||||
"mdx": true,
|
||||
"md": true
|
||||
}
|
||||
}
|
||||
|
||||
2
Makefile
2
Makefile
@@ -18,6 +18,7 @@ assets: deps
|
||||
|
||||
build: assets
|
||||
$(GO) build -o ./var/anubis ./cmd/anubis
|
||||
$(GO) build -o ./var/robots2policy ./cmd/robots2policy
|
||||
@echo "Anubis is now built to ./var/anubis"
|
||||
|
||||
lint: assets
|
||||
@@ -27,6 +28,7 @@ lint: assets
|
||||
|
||||
prebaked-build:
|
||||
$(GO) build -o ./var/anubis -ldflags "-X 'github.com/TecharoHQ/anubis.Version=$(VERSION)'" ./cmd/anubis
|
||||
$(GO) build -o ./var/robots2policy -ldflags "-X 'github.com/TecharoHQ/anubis.Version=$(VERSION)'" ./cmd/robots2policy
|
||||
|
||||
test: assets
|
||||
$(GO) test ./...
|
||||
|
||||
32
README.md
32
README.md
@@ -9,18 +9,42 @@
|
||||

|
||||

|
||||

|
||||
[](https://github.com/sponsors/Xe)
|
||||
|
||||
## Sponsors
|
||||
|
||||
Anubis is brought to you by sponsors and donors like:
|
||||
|
||||
[](https://distrust.co)
|
||||
[](https://terminaltrove.com/?utm_campaign=github&utm_medium=referral&utm_content=anubis&utm_source=abgh)
|
||||
[](https://canine.tools)
|
||||
### Diamond Tier
|
||||
|
||||
<a href="https://www.raptorcs.com/content/base/products.html">
|
||||
<img src="./docs/static/img/sponsors/raptor-computing-logo.webp" alt="Raptor Computing Systems" height=64 />
|
||||
</a>
|
||||
|
||||
### Gold Tier
|
||||
|
||||
<a href="https://distrust.co?utm_campaign=github&utm_medium=referral&utm_content=anubis">
|
||||
<img src="./docs/static/img/sponsors/distrust-logo.webp" alt="Distrust" height="64">
|
||||
</a>
|
||||
<a href="https://terminaltrove.com/?utm_campaign=github&utm_medium=referral&utm_content=anubis&utm_source=abgh">
|
||||
<img src="./docs/static/img/sponsors/terminal-trove.webp" alt="Terminal Trove" height="64">
|
||||
</a>
|
||||
<a href="https://canine.tools?utm_campaign=github&utm_medium=referral&utm_content=anubis">
|
||||
<img src="./docs/static/img/sponsors/caninetools-logo.webp" alt="canine.tools" height="64">
|
||||
</a>
|
||||
<a href="https://weblate.org/">
|
||||
<img src="./docs/static/img/sponsors/weblate-logo.webp" alt="Weblate" height="64">
|
||||
</a>
|
||||
<a href="https://uberspace.de/">
|
||||
<img src="./docs/static/img/sponsors/uberspace-logo.webp" alt="Uberspace" height="64">
|
||||
</a>
|
||||
<a href="https://wildbase.xyz/">
|
||||
<img src="./docs/static/img/sponsors/wildbase-logo.webp" alt="Wildbase" height="64">
|
||||
</a>
|
||||
|
||||
## Overview
|
||||
|
||||
Anubis [weighs the soul of your connection](https://en.wikipedia.org/wiki/Weighing_of_souls) using a proof-of-work challenge in order to protect upstream resources from scraper bots.
|
||||
Anubis is a Web AI Firewall Utility that [weighs the soul of your connection](https://en.wikipedia.org/wiki/Weighing_of_souls) using one or more challenges in order to protect upstream resources from scraper bots.
|
||||
|
||||
This program is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them.
|
||||
|
||||
|
||||
10
anubis.go
10
anubis.go
@@ -11,7 +11,11 @@ var Version = "devel"
|
||||
|
||||
// CookieName is the name of the cookie that Anubis uses in order to validate
|
||||
// access.
|
||||
const CookieName = "within.website-x-cmd-anubis-auth"
|
||||
var CookieName = "techaro.lol-anubis-auth"
|
||||
|
||||
// TestCookieName is the name of the cookie that Anubis uses in order to check
|
||||
// if cookies are enabled on the client's browser.
|
||||
var TestCookieName = "techaro.lol-anubis-cookie-verification"
|
||||
|
||||
// CookieDefaultExpirationTime is the amount of time before the cookie/JWT expires.
|
||||
const CookieDefaultExpirationTime = 7 * 24 * time.Hour
|
||||
@@ -28,3 +32,7 @@ const APIPrefix = "/.within.website/x/cmd/anubis/api/"
|
||||
// DefaultDifficulty is the default "difficulty" (number of leading zeroes)
|
||||
// that must be met by the client in order to pass the challenge.
|
||||
const DefaultDifficulty = 4
|
||||
|
||||
// ForcedLanguage is the language being used instead of the one of the request's Accept-Language header
|
||||
// if being set.
|
||||
var ForcedLanguage = ""
|
||||
|
||||
@@ -5,6 +5,7 @@ import (
|
||||
"context"
|
||||
"crypto/ed25519"
|
||||
"crypto/rand"
|
||||
"crypto/tls"
|
||||
"embed"
|
||||
"encoding/hex"
|
||||
"errors"
|
||||
@@ -29,11 +30,13 @@ import (
|
||||
"github.com/TecharoHQ/anubis"
|
||||
"github.com/TecharoHQ/anubis/data"
|
||||
"github.com/TecharoHQ/anubis/internal"
|
||||
"github.com/TecharoHQ/anubis/internal/thoth"
|
||||
libanubis "github.com/TecharoHQ/anubis/lib"
|
||||
botPolicy "github.com/TecharoHQ/anubis/lib/policy"
|
||||
"github.com/TecharoHQ/anubis/lib/policy/config"
|
||||
"github.com/TecharoHQ/anubis/web"
|
||||
"github.com/facebookgo/flagenv"
|
||||
_ "github.com/joho/godotenv/autoload"
|
||||
"github.com/prometheus/client_golang/prometheus/promhttp"
|
||||
)
|
||||
|
||||
@@ -43,8 +46,13 @@ var (
|
||||
bindNetwork = flag.String("bind-network", "tcp", "network family to bind HTTP to, e.g. unix, tcp")
|
||||
challengeDifficulty = flag.Int("difficulty", anubis.DefaultDifficulty, "difficulty of the challenge")
|
||||
cookieDomain = flag.String("cookie-domain", "", "if set, the top-level domain that the Anubis cookie will be valid for")
|
||||
cookieDynamicDomain = flag.Bool("cookie-dynamic-domain", false, "if set, automatically set the cookie Domain value based on the request domain")
|
||||
cookieExpiration = flag.Duration("cookie-expiration-time", anubis.CookieDefaultExpirationTime, "The amount of time the authorization cookie is valid for")
|
||||
cookiePrefix = flag.String("cookie-prefix", "techaro.lol-anubis", "prefix for browser cookies created by Anubis")
|
||||
cookiePartitioned = flag.Bool("cookie-partitioned", false, "if true, sets the partitioned flag on Anubis cookies, enabling CHIPS support")
|
||||
forcedLanguage = flag.String("forced-language", "", "if set, this language is being used instead of the one from the request's Accept-Language header")
|
||||
hs512Secret = flag.String("hs512-secret", "", "secret used to sign JWTs, uses ed25519 if not set")
|
||||
cookieSecure = flag.Bool("cookie-secure", true, "if true, sets the secure flag on Anubis cookies")
|
||||
ed25519PrivateKeyHex = flag.String("ed25519-private-key-hex", "", "private key used to sign JWTs, if not set a random one will be assigned")
|
||||
ed25519PrivateKeyHexFile = flag.String("ed25519-private-key-hex-file", "", "file name containing value for ed25519-private-key-hex")
|
||||
metricsBind = flag.String("metrics-bind", ":9090", "network address to bind metrics to")
|
||||
@@ -54,7 +62,11 @@ var (
|
||||
policyFname = flag.String("policy-fname", "", "full path to anubis policy document (defaults to a sensible built-in policy)")
|
||||
redirectDomains = flag.String("redirect-domains", "", "list of domains separated by commas which anubis is allowed to redirect to. Leaving this unset allows any domain.")
|
||||
slogLevel = flag.String("slog-level", "INFO", "logging level (see https://pkg.go.dev/log/slog#hdr-Levels)")
|
||||
stripBasePrefix = flag.Bool("strip-base-prefix", false, "if true, strips the base prefix from requests forwarded to the target server")
|
||||
target = flag.String("target", "http://localhost:3923", "target to reverse proxy to, set to an empty string to disable proxying when only using auth request")
|
||||
targetSNI = flag.String("target-sni", "", "if set, the value of the TLS handshake hostname when forwarding requests to the target")
|
||||
targetHost = flag.String("target-host", "", "if set, the value of the Host header when forwarding requests to the target")
|
||||
targetInsecureSkipVerify = flag.Bool("target-insecure-skip-verify", false, "if true, skips TLS validation for the backend")
|
||||
healthcheck = flag.Bool("healthcheck", false, "run a health check against Anubis")
|
||||
useRemoteAddress = flag.Bool("use-remote-address", false, "read the client's IP address from the network request, useful for debugging and running Anubis on bare metal")
|
||||
debugBenchmarkJS = flag.Bool("debug-benchmark-js", false, "respond to every request with a challenge for benchmarking hashrate")
|
||||
@@ -63,6 +75,12 @@ var (
|
||||
ogCacheConsiderHost = flag.Bool("og-cache-consider-host", false, "enable or disable the use of the host in the Open Graph tag cache")
|
||||
extractResources = flag.String("extract-resources", "", "if set, extract the static resources to the specified folder")
|
||||
webmasterEmail = flag.String("webmaster-email", "", "if set, displays webmaster's email on the reject page for appeals")
|
||||
versionFlag = flag.Bool("version", false, "print Anubis version")
|
||||
xffStripPrivate = flag.Bool("xff-strip-private", true, "if set, strip private addresses from X-Forwarded-For")
|
||||
|
||||
thothInsecure = flag.Bool("thoth-insecure", false, "if set, connect to Thoth over plain HTTP/2, don't enable this unless support told you to")
|
||||
thothURL = flag.String("thoth-url", "", "if set, URL for Thoth, the IP reputation database for Anubis")
|
||||
thothToken = flag.String("thoth-token", "", "if set, API token for Thoth, the IP reputation database for Anubis")
|
||||
)
|
||||
|
||||
func keyFromHex(value string) (ed25519.PrivateKey, error) {
|
||||
@@ -92,8 +110,41 @@ func doHealthCheck() error {
|
||||
return nil
|
||||
}
|
||||
|
||||
// parseBindNetFromAddr determine bind network and address based on the given network and address.
|
||||
func parseBindNetFromAddr(address string) (string, string) {
|
||||
defaultScheme := "http://"
|
||||
if !strings.Contains(address, "://") {
|
||||
if strings.HasPrefix(address, ":") {
|
||||
address = defaultScheme + "localhost" + address
|
||||
} else {
|
||||
address = defaultScheme + address
|
||||
}
|
||||
}
|
||||
|
||||
bindUri, err := url.Parse(address)
|
||||
if err != nil {
|
||||
log.Fatal(fmt.Errorf("failed to parse bind URL: %w", err))
|
||||
}
|
||||
|
||||
switch bindUri.Scheme {
|
||||
case "unix":
|
||||
return "unix", bindUri.Path
|
||||
case "tcp", "http", "https":
|
||||
return "tcp", bindUri.Host
|
||||
default:
|
||||
log.Fatal(fmt.Errorf("unsupported network scheme %s in address %s", bindUri.Scheme, address))
|
||||
}
|
||||
return "", address
|
||||
}
|
||||
|
||||
func setupListener(network string, address string) (net.Listener, string) {
|
||||
formattedAddress := ""
|
||||
|
||||
if network == "" {
|
||||
// keep compatibility
|
||||
network, address = parseBindNetFromAddr(address)
|
||||
}
|
||||
|
||||
switch network {
|
||||
case "unix":
|
||||
formattedAddress = "unix:" + address
|
||||
@@ -133,7 +184,7 @@ func setupListener(network string, address string) (net.Listener, string) {
|
||||
return listener, formattedAddress
|
||||
}
|
||||
|
||||
func makeReverseProxy(target string) (http.Handler, error) {
|
||||
func makeReverseProxy(target string, targetSNI string, targetHost string, insecureSkipVerify bool) (http.Handler, error) {
|
||||
targetUri, err := url.Parse(target)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to parse target URL: %w", err)
|
||||
@@ -155,9 +206,28 @@ func makeReverseProxy(target string) (http.Handler, error) {
|
||||
transport.RegisterProtocol("unix", libanubis.UnixRoundTripper{Transport: transport})
|
||||
}
|
||||
|
||||
if insecureSkipVerify || targetSNI != "" {
|
||||
transport.TLSClientConfig = &tls.Config{}
|
||||
if insecureSkipVerify {
|
||||
slog.Warn("TARGET_INSECURE_SKIP_VERIFY is set to true, TLS certificate validation will not be performed", "target", target)
|
||||
transport.TLSClientConfig.InsecureSkipVerify = true
|
||||
}
|
||||
if targetSNI != "" {
|
||||
transport.TLSClientConfig.ServerName = targetSNI
|
||||
}
|
||||
}
|
||||
|
||||
rp := httputil.NewSingleHostReverseProxy(targetUri)
|
||||
rp.Transport = transport
|
||||
|
||||
if targetHost != "" {
|
||||
originalDirector := rp.Director
|
||||
rp.Director = func(req *http.Request) {
|
||||
originalDirector(req)
|
||||
req.Host = targetHost
|
||||
}
|
||||
}
|
||||
|
||||
return rp, nil
|
||||
}
|
||||
|
||||
@@ -179,6 +249,11 @@ func main() {
|
||||
flagenv.Parse()
|
||||
flag.Parse()
|
||||
|
||||
if *versionFlag {
|
||||
fmt.Println("Anubis", anubis.Version)
|
||||
return
|
||||
}
|
||||
|
||||
internal.InitSlog(*slogLevel)
|
||||
|
||||
if *extractResources != "" {
|
||||
@@ -196,13 +271,35 @@ func main() {
|
||||
// when using anubis via Systemd and environment variables, then it is not possible to set targe to an empty string but only to space
|
||||
if strings.TrimSpace(*target) != "" {
|
||||
var err error
|
||||
rp, err = makeReverseProxy(*target)
|
||||
rp, err = makeReverseProxy(*target, *targetSNI, *targetHost, *targetInsecureSkipVerify)
|
||||
if err != nil {
|
||||
log.Fatalf("can't make reverse proxy: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
policy, err := libanubis.LoadPoliciesOrDefault(*policyFname, *challengeDifficulty)
|
||||
if *cookieDomain != "" && *cookieDynamicDomain {
|
||||
log.Fatalf("you can't set COOKIE_DOMAIN and COOKIE_DYNAMIC_DOMAIN at the same time")
|
||||
}
|
||||
|
||||
ctx := context.Background()
|
||||
|
||||
// Thoth configuration
|
||||
switch {
|
||||
case *thothURL != "" && *thothToken == "":
|
||||
slog.Warn("THOTH_URL is set but no THOTH_TOKEN is set")
|
||||
case *thothURL == "" && *thothToken != "":
|
||||
slog.Warn("THOTH_TOKEN is set but no THOTH_URL is set")
|
||||
case *thothURL != "" && *thothToken != "":
|
||||
slog.Debug("connecting to Thoth")
|
||||
thothClient, err := thoth.New(ctx, *thothURL, *thothToken, *thothInsecure)
|
||||
if err != nil {
|
||||
log.Fatalf("can't dial thoth at %s: %v", *thothURL, err)
|
||||
}
|
||||
|
||||
ctx = thoth.With(ctx, thothClient)
|
||||
}
|
||||
|
||||
policy, err := libanubis.LoadPoliciesOrDefault(ctx, *policyFname, *challengeDifficulty)
|
||||
if err != nil {
|
||||
log.Fatalf("can't parse policy file: %v", err)
|
||||
}
|
||||
@@ -230,12 +327,20 @@ func main() {
|
||||
} else if strings.HasSuffix(*basePrefix, "/") {
|
||||
log.Fatalf("[misconfiguration] base-prefix must not end with a slash")
|
||||
}
|
||||
if *stripBasePrefix && *basePrefix == "" {
|
||||
log.Fatalf("[misconfiguration] strip-base-prefix is set to true, but base-prefix is not set, " +
|
||||
"this may result in unexpected behavior")
|
||||
}
|
||||
|
||||
var priv ed25519.PrivateKey
|
||||
if *ed25519PrivateKeyHex != "" && *ed25519PrivateKeyHexFile != "" {
|
||||
var ed25519Priv ed25519.PrivateKey
|
||||
if *hs512Secret != "" && (*ed25519PrivateKeyHex != "" || *ed25519PrivateKeyHexFile != "") {
|
||||
log.Fatal("do not specify both HS512 and ED25519 secrets")
|
||||
} else if *hs512Secret != "" {
|
||||
ed25519Priv = ed25519.PrivateKey(*hs512Secret)
|
||||
} else if *ed25519PrivateKeyHex != "" && *ed25519PrivateKeyHexFile != "" {
|
||||
log.Fatal("do not specify both ED25519_PRIVATE_KEY_HEX and ED25519_PRIVATE_KEY_HEX_FILE")
|
||||
} else if *ed25519PrivateKeyHex != "" {
|
||||
priv, err = keyFromHex(*ed25519PrivateKeyHex)
|
||||
ed25519Priv, err = keyFromHex(*ed25519PrivateKeyHex)
|
||||
if err != nil {
|
||||
log.Fatalf("failed to parse and validate ED25519_PRIVATE_KEY_HEX: %v", err)
|
||||
}
|
||||
@@ -245,12 +350,12 @@ func main() {
|
||||
log.Fatalf("failed to read ED25519_PRIVATE_KEY_HEX_FILE %s: %v", *ed25519PrivateKeyHexFile, err)
|
||||
}
|
||||
|
||||
priv, err = keyFromHex(string(bytes.TrimSpace(hexFile)))
|
||||
ed25519Priv, err = keyFromHex(string(bytes.TrimSpace(hexFile)))
|
||||
if err != nil {
|
||||
log.Fatalf("failed to parse and validate content of ED25519_PRIVATE_KEY_HEX_FILE: %v", err)
|
||||
}
|
||||
} else {
|
||||
_, priv, err = ed25519.GenerateKey(rand.Reader)
|
||||
_, ed25519Priv, err = ed25519.GenerateKey(rand.Reader)
|
||||
if err != nil {
|
||||
log.Fatalf("failed to generate ed25519 key: %v", err)
|
||||
}
|
||||
@@ -272,21 +377,36 @@ func main() {
|
||||
slog.Warn("REDIRECT_DOMAINS is not set, Anubis will only redirect to the same domain a request is coming from, see https://anubis.techaro.lol/docs/admin/configuration/redirect-domains")
|
||||
}
|
||||
|
||||
anubis.CookieName = *cookiePrefix + "-auth"
|
||||
anubis.TestCookieName = *cookiePrefix + "-cookie-verification"
|
||||
anubis.ForcedLanguage = *forcedLanguage
|
||||
|
||||
// If OpenGraph configuration values are not set in the config file, use the
|
||||
// values from flags / envvars.
|
||||
if !policy.OpenGraph.Enabled {
|
||||
policy.OpenGraph.Enabled = *ogPassthrough
|
||||
policy.OpenGraph.ConsiderHost = *ogCacheConsiderHost
|
||||
policy.OpenGraph.TimeToLive = *ogTimeToLive
|
||||
policy.OpenGraph.Override = map[string]string{}
|
||||
}
|
||||
|
||||
s, err := libanubis.New(libanubis.Options{
|
||||
BasePrefix: *basePrefix,
|
||||
Next: rp,
|
||||
Policy: policy,
|
||||
ServeRobotsTXT: *robotsTxt,
|
||||
PrivateKey: priv,
|
||||
CookieDomain: *cookieDomain,
|
||||
CookieExpiration: *cookieExpiration,
|
||||
CookiePartitioned: *cookiePartitioned,
|
||||
OGPassthrough: *ogPassthrough,
|
||||
OGTimeToLive: *ogTimeToLive,
|
||||
RedirectDomains: redirectDomainsList,
|
||||
Target: *target,
|
||||
WebmasterEmail: *webmasterEmail,
|
||||
OGCacheConsidersHost: *ogCacheConsiderHost,
|
||||
BasePrefix: *basePrefix,
|
||||
StripBasePrefix: *stripBasePrefix,
|
||||
Next: rp,
|
||||
Policy: policy,
|
||||
ServeRobotsTXT: *robotsTxt,
|
||||
ED25519PrivateKey: ed25519Priv,
|
||||
HS512Secret: []byte(*hs512Secret),
|
||||
CookieDomain: *cookieDomain,
|
||||
CookieDynamicDomain: *cookieDynamicDomain,
|
||||
CookieExpiration: *cookieExpiration,
|
||||
CookiePartitioned: *cookiePartitioned,
|
||||
RedirectDomains: redirectDomainsList,
|
||||
Target: *target,
|
||||
WebmasterEmail: *webmasterEmail,
|
||||
OpenGraph: policy.OpenGraph,
|
||||
CookieSecure: *cookieSecure,
|
||||
})
|
||||
if err != nil {
|
||||
log.Fatalf("can't construct libanubis.Server: %v", err)
|
||||
@@ -307,7 +427,7 @@ func main() {
|
||||
h = s
|
||||
h = internal.RemoteXRealIP(*useRemoteAddress, *bindNetwork, h)
|
||||
h = internal.XForwardedForToXRealIP(h)
|
||||
h = internal.XForwardedForUpdate(h)
|
||||
h = internal.XForwardedForUpdate(*xffStripPrivate, h)
|
||||
|
||||
srv := http.Server{Handler: h, ErrorLog: internal.GetFilteredHTTPLogger()}
|
||||
listener, listenerUrl := setupListener(*bindNetwork, *bind)
|
||||
@@ -391,11 +511,11 @@ func extractEmbedFS(fsys embed.FS, root string, destDir string) error {
|
||||
return os.MkdirAll(destPath, 0o700)
|
||||
}
|
||||
|
||||
data, err := fs.ReadFile(fsys, path)
|
||||
embeddedData, err := fs.ReadFile(fsys, path)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return os.WriteFile(destPath, data, 0o644)
|
||||
return os.WriteFile(destPath, embeddedData, 0o644)
|
||||
})
|
||||
}
|
||||
|
||||
78
cmd/robots2policy/batch/batch_process.go
Normal file
78
cmd/robots2policy/batch/batch_process.go
Normal file
@@ -0,0 +1,78 @@
|
||||
/*
|
||||
Batch process robots.txt files from archives like https://github.com/nrjones8/robots-dot-txt-archive-bot/tree/master/data/cleaned
|
||||
into Anubis CEL policies. Usage: go run batch_process.go <directory with robots.txt files>
|
||||
*/
|
||||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"io/fs"
|
||||
"log"
|
||||
"os"
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
)
|
||||
|
||||
func main() {
|
||||
if len(os.Args) < 2 {
|
||||
fmt.Println("Usage: go run batch_process.go <cleaned_directory>")
|
||||
fmt.Println("Example: go run batch_process.go ./cleaned")
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
cleanedDir := os.Args[1]
|
||||
outputDir := "generated_policies"
|
||||
|
||||
// Create output directory
|
||||
if err := os.MkdirAll(outputDir, 0755); err != nil {
|
||||
log.Fatalf("Failed to create output directory: %v", err)
|
||||
}
|
||||
|
||||
count := 0
|
||||
err := filepath.WalkDir(cleanedDir, func(path string, d fs.DirEntry, err error) error {
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Skip directories
|
||||
if d.IsDir() {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Generate policy name from file path
|
||||
relPath, _ := filepath.Rel(cleanedDir, path)
|
||||
policyName := strings.ReplaceAll(relPath, "/", "-")
|
||||
policyName = strings.TrimSuffix(policyName, "-robots.txt")
|
||||
policyName = strings.ReplaceAll(policyName, ".", "-")
|
||||
|
||||
outputFile := filepath.Join(outputDir, policyName+".yaml")
|
||||
|
||||
cmd := exec.Command("go", "run", "main.go",
|
||||
"-input", path,
|
||||
"-output", outputFile,
|
||||
"-name", policyName,
|
||||
"-format", "yaml")
|
||||
|
||||
if err := cmd.Run(); err != nil {
|
||||
fmt.Printf("Warning: Failed to process %s: %v\n", path, err)
|
||||
return nil // Continue processing other files
|
||||
}
|
||||
|
||||
count++
|
||||
if count%100 == 0 {
|
||||
fmt.Printf("Processed %d files...\n", count)
|
||||
} else if count%10 == 0 {
|
||||
fmt.Print(".")
|
||||
}
|
||||
|
||||
return nil
|
||||
})
|
||||
|
||||
if err != nil {
|
||||
log.Fatalf("Error walking directory: %v", err)
|
||||
}
|
||||
|
||||
fmt.Printf("Successfully processed %d robots.txt files\n", count)
|
||||
fmt.Printf("Generated policies saved to: %s/\n", outputDir)
|
||||
}
|
||||
313
cmd/robots2policy/main.go
Normal file
313
cmd/robots2policy/main.go
Normal file
@@ -0,0 +1,313 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"encoding/json"
|
||||
"flag"
|
||||
"fmt"
|
||||
"io"
|
||||
"log"
|
||||
"net/http"
|
||||
"os"
|
||||
"regexp"
|
||||
"strings"
|
||||
|
||||
"github.com/TecharoHQ/anubis/lib/policy/config"
|
||||
|
||||
"sigs.k8s.io/yaml"
|
||||
)
|
||||
|
||||
var (
|
||||
inputFile = flag.String("input", "", "path to robots.txt file (use - for stdin)")
|
||||
outputFile = flag.String("output", "", "output file path (use - for stdout, defaults to stdout)")
|
||||
outputFormat = flag.String("format", "yaml", "output format: yaml or json")
|
||||
baseAction = flag.String("action", "CHALLENGE", "default action for disallowed paths: ALLOW, DENY, CHALLENGE, WEIGH")
|
||||
crawlDelay = flag.Int("crawl-delay-weight", 0, "if > 0, add weight adjustment for crawl-delay (difficulty adjustment)")
|
||||
policyName = flag.String("name", "robots-txt-policy", "name for the generated policy")
|
||||
userAgentDeny = flag.String("deny-user-agents", "DENY", "action for specifically blocked user agents: DENY, CHALLENGE")
|
||||
helpFlag = flag.Bool("help", false, "show help")
|
||||
)
|
||||
|
||||
type RobotsRule struct {
|
||||
UserAgent string
|
||||
Disallows []string
|
||||
Allows []string
|
||||
CrawlDelay int
|
||||
IsBlacklist bool // true if this is a specifically denied user agent
|
||||
}
|
||||
|
||||
type AnubisRule struct {
|
||||
Expression *config.ExpressionOrList `yaml:"expression,omitempty" json:"expression,omitempty"`
|
||||
Challenge *config.ChallengeRules `yaml:"challenge,omitempty" json:"challenge,omitempty"`
|
||||
Weight *config.Weight `yaml:"weight,omitempty" json:"weight,omitempty"`
|
||||
Name string `yaml:"name" json:"name"`
|
||||
Action string `yaml:"action" json:"action"`
|
||||
}
|
||||
|
||||
func init() {
|
||||
flag.Usage = func() {
|
||||
fmt.Fprintf(os.Stderr, "Usage of %s:\n", os.Args[0])
|
||||
fmt.Fprintf(os.Stderr, "%s [options] -input <robots.txt>\n\n", os.Args[0])
|
||||
flag.PrintDefaults()
|
||||
fmt.Fprintln(os.Stderr, "\nExamples:")
|
||||
fmt.Fprintln(os.Stderr, " # Convert local robots.txt file")
|
||||
fmt.Fprintln(os.Stderr, " robots2policy -input robots.txt -output policy.yaml")
|
||||
fmt.Fprintln(os.Stderr, "")
|
||||
fmt.Fprintln(os.Stderr, " # Convert from URL")
|
||||
fmt.Fprintln(os.Stderr, " robots2policy -input https://example.com/robots.txt -format json")
|
||||
fmt.Fprintln(os.Stderr, "")
|
||||
fmt.Fprintln(os.Stderr, " # Read from stdin, write to stdout")
|
||||
fmt.Fprintln(os.Stderr, " curl https://example.com/robots.txt | robots2policy -input -")
|
||||
os.Exit(2)
|
||||
}
|
||||
}
|
||||
|
||||
func main() {
|
||||
flag.Parse()
|
||||
|
||||
if len(flag.Args()) > 0 || *helpFlag || *inputFile == "" {
|
||||
flag.Usage()
|
||||
}
|
||||
|
||||
// Read robots.txt
|
||||
var input io.Reader
|
||||
if *inputFile == "-" {
|
||||
input = os.Stdin
|
||||
} else if strings.HasPrefix(*inputFile, "http://") || strings.HasPrefix(*inputFile, "https://") {
|
||||
resp, err := http.Get(*inputFile)
|
||||
if err != nil {
|
||||
log.Fatalf("failed to fetch robots.txt from URL: %v", err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
input = resp.Body
|
||||
} else {
|
||||
file, err := os.Open(*inputFile)
|
||||
if err != nil {
|
||||
log.Fatalf("failed to open input file: %v", err)
|
||||
}
|
||||
defer file.Close()
|
||||
input = file
|
||||
}
|
||||
|
||||
// Parse robots.txt
|
||||
rules, err := parseRobotsTxt(input)
|
||||
if err != nil {
|
||||
log.Fatalf("failed to parse robots.txt: %v", err)
|
||||
}
|
||||
|
||||
// Convert to Anubis rules
|
||||
anubisRules := convertToAnubisRules(rules)
|
||||
|
||||
// Check if any rules were generated
|
||||
if len(anubisRules) == 0 {
|
||||
log.Fatal("no valid rules generated from robots.txt - file may be empty or contain no disallow directives")
|
||||
}
|
||||
|
||||
// Generate output
|
||||
var output []byte
|
||||
switch strings.ToLower(*outputFormat) {
|
||||
case "yaml":
|
||||
output, err = yaml.Marshal(anubisRules)
|
||||
case "json":
|
||||
output, err = json.MarshalIndent(anubisRules, "", " ")
|
||||
default:
|
||||
log.Fatalf("unsupported output format: %s (use yaml or json)", *outputFormat)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
log.Fatalf("failed to marshal output: %v", err)
|
||||
}
|
||||
|
||||
// Write output
|
||||
if *outputFile == "" || *outputFile == "-" {
|
||||
fmt.Print(string(output))
|
||||
} else {
|
||||
err = os.WriteFile(*outputFile, output, 0644)
|
||||
if err != nil {
|
||||
log.Fatalf("failed to write output file: %v", err)
|
||||
}
|
||||
fmt.Printf("Generated Anubis policy written to %s\n", *outputFile)
|
||||
}
|
||||
}
|
||||
|
||||
func parseRobotsTxt(input io.Reader) ([]RobotsRule, error) {
|
||||
scanner := bufio.NewScanner(input)
|
||||
var rules []RobotsRule
|
||||
var currentRule *RobotsRule
|
||||
|
||||
for scanner.Scan() {
|
||||
line := strings.TrimSpace(scanner.Text())
|
||||
|
||||
// Skip empty lines and comments
|
||||
if line == "" || strings.HasPrefix(line, "#") {
|
||||
continue
|
||||
}
|
||||
|
||||
// Split on first colon
|
||||
parts := strings.SplitN(line, ":", 2)
|
||||
if len(parts) != 2 {
|
||||
continue
|
||||
}
|
||||
|
||||
directive := strings.TrimSpace(strings.ToLower(parts[0]))
|
||||
value := strings.TrimSpace(parts[1])
|
||||
|
||||
switch directive {
|
||||
case "user-agent":
|
||||
// Start a new rule section
|
||||
if currentRule != nil {
|
||||
rules = append(rules, *currentRule)
|
||||
}
|
||||
currentRule = &RobotsRule{
|
||||
UserAgent: value,
|
||||
Disallows: make([]string, 0),
|
||||
Allows: make([]string, 0),
|
||||
}
|
||||
|
||||
case "disallow":
|
||||
if currentRule != nil && value != "" {
|
||||
currentRule.Disallows = append(currentRule.Disallows, value)
|
||||
}
|
||||
|
||||
case "allow":
|
||||
if currentRule != nil && value != "" {
|
||||
currentRule.Allows = append(currentRule.Allows, value)
|
||||
}
|
||||
|
||||
case "crawl-delay":
|
||||
if currentRule != nil {
|
||||
if delay, err := parseIntSafe(value); err == nil {
|
||||
currentRule.CrawlDelay = delay
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Don't forget the last rule
|
||||
if currentRule != nil {
|
||||
rules = append(rules, *currentRule)
|
||||
}
|
||||
|
||||
// Mark blacklisted user agents (those with "Disallow: /")
|
||||
for i := range rules {
|
||||
for _, disallow := range rules[i].Disallows {
|
||||
if disallow == "/" {
|
||||
rules[i].IsBlacklist = true
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return rules, scanner.Err()
|
||||
}
|
||||
|
||||
func parseIntSafe(s string) (int, error) {
|
||||
var result int
|
||||
_, err := fmt.Sscanf(s, "%d", &result)
|
||||
return result, err
|
||||
}
|
||||
|
||||
func convertToAnubisRules(robotsRules []RobotsRule) []AnubisRule {
|
||||
var anubisRules []AnubisRule
|
||||
ruleCounter := 0
|
||||
|
||||
for _, robotsRule := range robotsRules {
|
||||
userAgent := robotsRule.UserAgent
|
||||
|
||||
// Handle crawl delay as weight adjustment (do this first before any continues)
|
||||
if robotsRule.CrawlDelay > 0 && *crawlDelay > 0 {
|
||||
ruleCounter++
|
||||
rule := AnubisRule{
|
||||
Name: fmt.Sprintf("%s-crawl-delay-%d", *policyName, ruleCounter),
|
||||
Action: "WEIGH",
|
||||
Weight: &config.Weight{Adjust: *crawlDelay},
|
||||
}
|
||||
|
||||
if userAgent == "*" {
|
||||
rule.Expression = &config.ExpressionOrList{
|
||||
All: []string{"true"}, // Always applies
|
||||
}
|
||||
} else {
|
||||
rule.Expression = &config.ExpressionOrList{
|
||||
All: []string{fmt.Sprintf("userAgent.contains(%q)", userAgent)},
|
||||
}
|
||||
}
|
||||
|
||||
anubisRules = append(anubisRules, rule)
|
||||
}
|
||||
|
||||
// Handle blacklisted user agents (complete deny/challenge)
|
||||
if robotsRule.IsBlacklist {
|
||||
ruleCounter++
|
||||
rule := AnubisRule{
|
||||
Name: fmt.Sprintf("%s-blacklist-%d", *policyName, ruleCounter),
|
||||
Action: *userAgentDeny,
|
||||
}
|
||||
|
||||
if userAgent == "*" {
|
||||
// This would block everything - convert to a weight adjustment instead
|
||||
rule.Name = fmt.Sprintf("%s-global-restriction-%d", *policyName, ruleCounter)
|
||||
rule.Action = "WEIGH"
|
||||
rule.Weight = &config.Weight{Adjust: 20} // Increase difficulty significantly
|
||||
rule.Expression = &config.ExpressionOrList{
|
||||
All: []string{"true"}, // Always applies
|
||||
}
|
||||
} else {
|
||||
rule.Expression = &config.ExpressionOrList{
|
||||
All: []string{fmt.Sprintf("userAgent.contains(%q)", userAgent)},
|
||||
}
|
||||
}
|
||||
anubisRules = append(anubisRules, rule)
|
||||
continue
|
||||
}
|
||||
|
||||
// Handle specific disallow rules
|
||||
for _, disallow := range robotsRule.Disallows {
|
||||
if disallow == "/" {
|
||||
continue // Already handled as blacklist above
|
||||
}
|
||||
|
||||
ruleCounter++
|
||||
rule := AnubisRule{
|
||||
Name: fmt.Sprintf("%s-disallow-%d", *policyName, ruleCounter),
|
||||
Action: *baseAction,
|
||||
}
|
||||
|
||||
// Build CEL expression
|
||||
var conditions []string
|
||||
|
||||
// Add user agent condition if not wildcard
|
||||
if userAgent != "*" {
|
||||
conditions = append(conditions, fmt.Sprintf("userAgent.contains(%q)", userAgent))
|
||||
}
|
||||
|
||||
// Add path condition
|
||||
pathCondition := buildPathCondition(disallow)
|
||||
conditions = append(conditions, pathCondition)
|
||||
|
||||
rule.Expression = &config.ExpressionOrList{
|
||||
All: conditions,
|
||||
}
|
||||
|
||||
anubisRules = append(anubisRules, rule)
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
return anubisRules
|
||||
}
|
||||
|
||||
func buildPathCondition(robotsPath string) string {
|
||||
// Handle wildcards in robots.txt paths
|
||||
if strings.Contains(robotsPath, "*") || strings.Contains(robotsPath, "?") {
|
||||
// Convert robots.txt wildcards to regex
|
||||
regex := regexp.QuoteMeta(robotsPath)
|
||||
regex = strings.ReplaceAll(regex, `\*`, `.*`) // * becomes .*
|
||||
regex = strings.ReplaceAll(regex, `\?`, `.`) // ? becomes .
|
||||
regex = "^" + regex
|
||||
return fmt.Sprintf("path.matches(%q)", regex)
|
||||
}
|
||||
|
||||
// Simple prefix match for most cases
|
||||
return fmt.Sprintf("path.startsWith(%q)", robotsPath)
|
||||
}
|
||||
418
cmd/robots2policy/robots2policy_test.go
Normal file
418
cmd/robots2policy/robots2policy_test.go
Normal file
@@ -0,0 +1,418 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"reflect"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"gopkg.in/yaml.v3"
|
||||
)
|
||||
|
||||
type TestCase struct {
|
||||
name string
|
||||
robotsFile string
|
||||
expectedFile string
|
||||
options TestOptions
|
||||
}
|
||||
|
||||
type TestOptions struct {
|
||||
format string
|
||||
action string
|
||||
crawlDelayWeight int
|
||||
policyName string
|
||||
deniedAction string
|
||||
}
|
||||
|
||||
func TestDataFileConversion(t *testing.T) {
|
||||
|
||||
testCases := []TestCase{
|
||||
{
|
||||
name: "simple_default",
|
||||
robotsFile: "simple.robots.txt",
|
||||
expectedFile: "simple.yaml",
|
||||
options: TestOptions{format: "yaml"},
|
||||
},
|
||||
{
|
||||
name: "simple_json",
|
||||
robotsFile: "simple.robots.txt",
|
||||
expectedFile: "simple.json",
|
||||
options: TestOptions{format: "json"},
|
||||
},
|
||||
{
|
||||
name: "simple_deny_action",
|
||||
robotsFile: "simple.robots.txt",
|
||||
expectedFile: "deny-action.yaml",
|
||||
options: TestOptions{format: "yaml", action: "DENY"},
|
||||
},
|
||||
{
|
||||
name: "simple_custom_name",
|
||||
robotsFile: "simple.robots.txt",
|
||||
expectedFile: "custom-name.yaml",
|
||||
options: TestOptions{format: "yaml", policyName: "my-custom-policy"},
|
||||
},
|
||||
{
|
||||
name: "blacklist_with_crawl_delay",
|
||||
robotsFile: "blacklist.robots.txt",
|
||||
expectedFile: "blacklist.yaml",
|
||||
options: TestOptions{format: "yaml", crawlDelayWeight: 3},
|
||||
},
|
||||
{
|
||||
name: "wildcards",
|
||||
robotsFile: "wildcards.robots.txt",
|
||||
expectedFile: "wildcards.yaml",
|
||||
options: TestOptions{format: "yaml"},
|
||||
},
|
||||
{
|
||||
name: "empty_file",
|
||||
robotsFile: "empty.robots.txt",
|
||||
expectedFile: "empty.yaml",
|
||||
options: TestOptions{format: "yaml"},
|
||||
},
|
||||
{
|
||||
name: "complex_scenario",
|
||||
robotsFile: "complex.robots.txt",
|
||||
expectedFile: "complex.yaml",
|
||||
options: TestOptions{format: "yaml", crawlDelayWeight: 5},
|
||||
},
|
||||
}
|
||||
|
||||
for _, tc := range testCases {
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
robotsPath := filepath.Join("testdata", tc.robotsFile)
|
||||
expectedPath := filepath.Join("testdata", tc.expectedFile)
|
||||
|
||||
// Read robots.txt input
|
||||
robotsFile, err := os.Open(robotsPath)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to open robots file %s: %v", robotsPath, err)
|
||||
}
|
||||
defer robotsFile.Close()
|
||||
|
||||
// Parse robots.txt
|
||||
rules, err := parseRobotsTxt(robotsFile)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to parse robots.txt: %v", err)
|
||||
}
|
||||
|
||||
// Set test options
|
||||
oldFormat := *outputFormat
|
||||
oldAction := *baseAction
|
||||
oldCrawlDelay := *crawlDelay
|
||||
oldPolicyName := *policyName
|
||||
oldDeniedAction := *userAgentDeny
|
||||
|
||||
if tc.options.format != "" {
|
||||
*outputFormat = tc.options.format
|
||||
}
|
||||
if tc.options.action != "" {
|
||||
*baseAction = tc.options.action
|
||||
}
|
||||
if tc.options.crawlDelayWeight > 0 {
|
||||
*crawlDelay = tc.options.crawlDelayWeight
|
||||
}
|
||||
if tc.options.policyName != "" {
|
||||
*policyName = tc.options.policyName
|
||||
}
|
||||
if tc.options.deniedAction != "" {
|
||||
*userAgentDeny = tc.options.deniedAction
|
||||
}
|
||||
|
||||
// Restore options after test
|
||||
defer func() {
|
||||
*outputFormat = oldFormat
|
||||
*baseAction = oldAction
|
||||
*crawlDelay = oldCrawlDelay
|
||||
*policyName = oldPolicyName
|
||||
*userAgentDeny = oldDeniedAction
|
||||
}()
|
||||
|
||||
// Convert to Anubis rules
|
||||
anubisRules := convertToAnubisRules(rules)
|
||||
|
||||
// Generate output
|
||||
var actualOutput []byte
|
||||
switch strings.ToLower(*outputFormat) {
|
||||
case "yaml":
|
||||
actualOutput, err = yaml.Marshal(anubisRules)
|
||||
case "json":
|
||||
actualOutput, err = json.MarshalIndent(anubisRules, "", " ")
|
||||
}
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to marshal output: %v", err)
|
||||
}
|
||||
|
||||
// Read expected output
|
||||
expectedOutput, err := os.ReadFile(expectedPath)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to read expected file %s: %v", expectedPath, err)
|
||||
}
|
||||
|
||||
if strings.ToLower(*outputFormat) == "yaml" {
|
||||
var actualData []interface{}
|
||||
var expectedData []interface{}
|
||||
|
||||
err = yaml.Unmarshal(actualOutput, &actualData)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to unmarshal actual output: %v", err)
|
||||
}
|
||||
|
||||
err = yaml.Unmarshal(expectedOutput, &expectedData)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to unmarshal expected output: %v", err)
|
||||
}
|
||||
|
||||
// Compare data structures
|
||||
if !compareData(actualData, expectedData) {
|
||||
actualStr := strings.TrimSpace(string(actualOutput))
|
||||
expectedStr := strings.TrimSpace(string(expectedOutput))
|
||||
t.Errorf("Output mismatch for %s\nExpected:\n%s\n\nActual:\n%s", tc.name, expectedStr, actualStr)
|
||||
}
|
||||
} else {
|
||||
var actualData []interface{}
|
||||
var expectedData []interface{}
|
||||
|
||||
err = json.Unmarshal(actualOutput, &actualData)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to unmarshal actual JSON output: %v", err)
|
||||
}
|
||||
|
||||
err = json.Unmarshal(expectedOutput, &expectedData)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to unmarshal expected JSON output: %v", err)
|
||||
}
|
||||
|
||||
// Compare data structures
|
||||
if !compareData(actualData, expectedData) {
|
||||
actualStr := strings.TrimSpace(string(actualOutput))
|
||||
expectedStr := strings.TrimSpace(string(expectedOutput))
|
||||
t.Errorf("Output mismatch for %s\nExpected:\n%s\n\nActual:\n%s", tc.name, expectedStr, actualStr)
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestCaseInsensitiveParsing(t *testing.T) {
|
||||
robotsTxt := `User-Agent: *
|
||||
Disallow: /admin
|
||||
Crawl-Delay: 10
|
||||
|
||||
User-agent: TestBot
|
||||
disallow: /test
|
||||
crawl-delay: 5
|
||||
|
||||
USER-AGENT: UpperBot
|
||||
DISALLOW: /upper
|
||||
CRAWL-DELAY: 20`
|
||||
|
||||
reader := strings.NewReader(robotsTxt)
|
||||
rules, err := parseRobotsTxt(reader)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to parse case-insensitive robots.txt: %v", err)
|
||||
}
|
||||
|
||||
expectedRules := 3
|
||||
if len(rules) != expectedRules {
|
||||
t.Errorf("Expected %d rules, got %d", expectedRules, len(rules))
|
||||
}
|
||||
|
||||
// Check that all crawl delays were parsed
|
||||
for i, rule := range rules {
|
||||
expectedDelays := []int{10, 5, 20}
|
||||
if rule.CrawlDelay != expectedDelays[i] {
|
||||
t.Errorf("Rule %d: expected crawl delay %d, got %d", i, expectedDelays[i], rule.CrawlDelay)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestVariousOutputFormats(t *testing.T) {
|
||||
robotsTxt := `User-agent: *
|
||||
Disallow: /admin`
|
||||
|
||||
reader := strings.NewReader(robotsTxt)
|
||||
rules, err := parseRobotsTxt(reader)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to parse robots.txt: %v", err)
|
||||
}
|
||||
|
||||
oldPolicyName := *policyName
|
||||
*policyName = "test-policy"
|
||||
defer func() { *policyName = oldPolicyName }()
|
||||
|
||||
anubisRules := convertToAnubisRules(rules)
|
||||
|
||||
// Test YAML output
|
||||
yamlOutput, err := yaml.Marshal(anubisRules)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to marshal YAML: %v", err)
|
||||
}
|
||||
|
||||
if !strings.Contains(string(yamlOutput), "name: test-policy-disallow-1") {
|
||||
t.Errorf("YAML output doesn't contain expected rule name")
|
||||
}
|
||||
|
||||
// Test JSON output
|
||||
jsonOutput, err := json.MarshalIndent(anubisRules, "", " ")
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to marshal JSON: %v", err)
|
||||
}
|
||||
|
||||
if !strings.Contains(string(jsonOutput), `"name": "test-policy-disallow-1"`) {
|
||||
t.Errorf("JSON output doesn't contain expected rule name")
|
||||
}
|
||||
}
|
||||
|
||||
func TestDifferentActions(t *testing.T) {
|
||||
robotsTxt := `User-agent: *
|
||||
Disallow: /admin`
|
||||
|
||||
testActions := []string{"ALLOW", "DENY", "CHALLENGE", "WEIGH"}
|
||||
|
||||
for _, action := range testActions {
|
||||
t.Run("action_"+action, func(t *testing.T) {
|
||||
reader := strings.NewReader(robotsTxt)
|
||||
rules, err := parseRobotsTxt(reader)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to parse robots.txt: %v", err)
|
||||
}
|
||||
|
||||
oldAction := *baseAction
|
||||
*baseAction = action
|
||||
defer func() { *baseAction = oldAction }()
|
||||
|
||||
anubisRules := convertToAnubisRules(rules)
|
||||
|
||||
if len(anubisRules) != 1 {
|
||||
t.Fatalf("Expected 1 rule, got %d", len(anubisRules))
|
||||
}
|
||||
|
||||
if anubisRules[0].Action != action {
|
||||
t.Errorf("Expected action %s, got %s", action, anubisRules[0].Action)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestPolicyNaming(t *testing.T) {
|
||||
robotsTxt := `User-agent: *
|
||||
Disallow: /admin
|
||||
Disallow: /private
|
||||
|
||||
User-agent: BadBot
|
||||
Disallow: /`
|
||||
|
||||
testNames := []string{"custom-policy", "my-rules", "site-protection"}
|
||||
|
||||
for _, name := range testNames {
|
||||
t.Run("name_"+name, func(t *testing.T) {
|
||||
reader := strings.NewReader(robotsTxt)
|
||||
rules, err := parseRobotsTxt(reader)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to parse robots.txt: %v", err)
|
||||
}
|
||||
|
||||
oldName := *policyName
|
||||
*policyName = name
|
||||
defer func() { *policyName = oldName }()
|
||||
|
||||
anubisRules := convertToAnubisRules(rules)
|
||||
|
||||
// Check that all rule names use the custom prefix
|
||||
for _, rule := range anubisRules {
|
||||
if !strings.HasPrefix(rule.Name, name+"-") {
|
||||
t.Errorf("Rule name %s doesn't start with expected prefix %s-", rule.Name, name)
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestCrawlDelayWeights(t *testing.T) {
|
||||
robotsTxt := `User-agent: *
|
||||
Disallow: /admin
|
||||
Crawl-delay: 10
|
||||
|
||||
User-agent: SlowBot
|
||||
Disallow: /slow
|
||||
Crawl-delay: 60`
|
||||
|
||||
testWeights := []int{1, 5, 10, 25}
|
||||
|
||||
for _, weight := range testWeights {
|
||||
t.Run(fmt.Sprintf("weight_%d", weight), func(t *testing.T) {
|
||||
reader := strings.NewReader(robotsTxt)
|
||||
rules, err := parseRobotsTxt(reader)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to parse robots.txt: %v", err)
|
||||
}
|
||||
|
||||
oldWeight := *crawlDelay
|
||||
*crawlDelay = weight
|
||||
defer func() { *crawlDelay = oldWeight }()
|
||||
|
||||
anubisRules := convertToAnubisRules(rules)
|
||||
|
||||
// Count weight rules and verify they have correct weight
|
||||
weightRules := 0
|
||||
for _, rule := range anubisRules {
|
||||
if rule.Action == "WEIGH" && rule.Weight != nil {
|
||||
weightRules++
|
||||
if rule.Weight.Adjust != weight {
|
||||
t.Errorf("Expected weight %d, got %d", weight, rule.Weight.Adjust)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
expectedWeightRules := 2 // One for *, one for SlowBot
|
||||
if weightRules != expectedWeightRules {
|
||||
t.Errorf("Expected %d weight rules, got %d", expectedWeightRules, weightRules)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestBlacklistActions(t *testing.T) {
|
||||
robotsTxt := `User-agent: BadBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: SpamBot
|
||||
Disallow: /`
|
||||
|
||||
testActions := []string{"DENY", "CHALLENGE"}
|
||||
|
||||
for _, action := range testActions {
|
||||
t.Run("blacklist_"+action, func(t *testing.T) {
|
||||
reader := strings.NewReader(robotsTxt)
|
||||
rules, err := parseRobotsTxt(reader)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to parse robots.txt: %v", err)
|
||||
}
|
||||
|
||||
oldAction := *userAgentDeny
|
||||
*userAgentDeny = action
|
||||
defer func() { *userAgentDeny = oldAction }()
|
||||
|
||||
anubisRules := convertToAnubisRules(rules)
|
||||
|
||||
// All rules should be blacklist rules with the specified action
|
||||
for _, rule := range anubisRules {
|
||||
if !strings.Contains(rule.Name, "blacklist") {
|
||||
t.Errorf("Expected blacklist rule, got %s", rule.Name)
|
||||
}
|
||||
if rule.Action != action {
|
||||
t.Errorf("Expected action %s, got %s", action, rule.Action)
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// compareData performs a deep comparison of two data structures,
|
||||
// ignoring differences that are semantically equivalent in YAML/JSON
|
||||
func compareData(actual, expected interface{}) bool {
|
||||
return reflect.DeepEqual(actual, expected)
|
||||
}
|
||||
15
cmd/robots2policy/testdata/blacklist.robots.txt
vendored
Normal file
15
cmd/robots2policy/testdata/blacklist.robots.txt
vendored
Normal file
@@ -0,0 +1,15 @@
|
||||
# Test with blacklisted user agents
|
||||
User-agent: *
|
||||
Disallow: /admin
|
||||
Crawl-delay: 10
|
||||
|
||||
User-agent: BadBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: SpamBot
|
||||
Disallow: /
|
||||
Crawl-delay: 60
|
||||
|
||||
User-agent: Googlebot
|
||||
Disallow: /search
|
||||
Crawl-delay: 5
|
||||
30
cmd/robots2policy/testdata/blacklist.yaml
vendored
Normal file
30
cmd/robots2policy/testdata/blacklist.yaml
vendored
Normal file
@@ -0,0 +1,30 @@
|
||||
- action: WEIGH
|
||||
expression: "true"
|
||||
name: robots-txt-policy-crawl-delay-1
|
||||
weight:
|
||||
adjust: 3
|
||||
- action: CHALLENGE
|
||||
expression: path.startsWith("/admin")
|
||||
name: robots-txt-policy-disallow-2
|
||||
- action: DENY
|
||||
expression: userAgent.contains("BadBot")
|
||||
name: robots-txt-policy-blacklist-3
|
||||
- action: WEIGH
|
||||
expression: userAgent.contains("SpamBot")
|
||||
name: robots-txt-policy-crawl-delay-4
|
||||
weight:
|
||||
adjust: 3
|
||||
- action: DENY
|
||||
expression: userAgent.contains("SpamBot")
|
||||
name: robots-txt-policy-blacklist-5
|
||||
- action: WEIGH
|
||||
expression: userAgent.contains("Googlebot")
|
||||
name: robots-txt-policy-crawl-delay-6
|
||||
weight:
|
||||
adjust: 3
|
||||
- action: CHALLENGE
|
||||
expression:
|
||||
all:
|
||||
- userAgent.contains("Googlebot")
|
||||
- path.startsWith("/search")
|
||||
name: robots-txt-policy-disallow-7
|
||||
30
cmd/robots2policy/testdata/complex.robots.txt
vendored
Normal file
30
cmd/robots2policy/testdata/complex.robots.txt
vendored
Normal file
@@ -0,0 +1,30 @@
|
||||
# Complex real-world example
|
||||
User-agent: *
|
||||
Disallow: /admin/
|
||||
Disallow: /private/
|
||||
Disallow: /api/internal/
|
||||
Allow: /api/public/
|
||||
Crawl-delay: 5
|
||||
|
||||
User-agent: Googlebot
|
||||
Disallow: /search/
|
||||
Allow: /api/
|
||||
Crawl-delay: 2
|
||||
|
||||
User-agent: Bingbot
|
||||
Disallow: /search/
|
||||
Disallow: /admin/
|
||||
Crawl-delay: 10
|
||||
|
||||
User-agent: BadBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: SeoBot
|
||||
Disallow: /
|
||||
Crawl-delay: 300
|
||||
|
||||
# Test with various patterns
|
||||
User-agent: TestBot
|
||||
Disallow: /*/admin
|
||||
Disallow: /temp*.html
|
||||
Disallow: /file?.log
|
||||
71
cmd/robots2policy/testdata/complex.yaml
vendored
Normal file
71
cmd/robots2policy/testdata/complex.yaml
vendored
Normal file
@@ -0,0 +1,71 @@
|
||||
- action: WEIGH
|
||||
expression: "true"
|
||||
name: robots-txt-policy-crawl-delay-1
|
||||
weight:
|
||||
adjust: 5
|
||||
- action: CHALLENGE
|
||||
expression: path.startsWith("/admin/")
|
||||
name: robots-txt-policy-disallow-2
|
||||
- action: CHALLENGE
|
||||
expression: path.startsWith("/private/")
|
||||
name: robots-txt-policy-disallow-3
|
||||
- action: CHALLENGE
|
||||
expression: path.startsWith("/api/internal/")
|
||||
name: robots-txt-policy-disallow-4
|
||||
- action: WEIGH
|
||||
expression: userAgent.contains("Googlebot")
|
||||
name: robots-txt-policy-crawl-delay-5
|
||||
weight:
|
||||
adjust: 5
|
||||
- action: CHALLENGE
|
||||
expression:
|
||||
all:
|
||||
- userAgent.contains("Googlebot")
|
||||
- path.startsWith("/search/")
|
||||
name: robots-txt-policy-disallow-6
|
||||
- action: WEIGH
|
||||
expression: userAgent.contains("Bingbot")
|
||||
name: robots-txt-policy-crawl-delay-7
|
||||
weight:
|
||||
adjust: 5
|
||||
- action: CHALLENGE
|
||||
expression:
|
||||
all:
|
||||
- userAgent.contains("Bingbot")
|
||||
- path.startsWith("/search/")
|
||||
name: robots-txt-policy-disallow-8
|
||||
- action: CHALLENGE
|
||||
expression:
|
||||
all:
|
||||
- userAgent.contains("Bingbot")
|
||||
- path.startsWith("/admin/")
|
||||
name: robots-txt-policy-disallow-9
|
||||
- action: DENY
|
||||
expression: userAgent.contains("BadBot")
|
||||
name: robots-txt-policy-blacklist-10
|
||||
- action: WEIGH
|
||||
expression: userAgent.contains("SeoBot")
|
||||
name: robots-txt-policy-crawl-delay-11
|
||||
weight:
|
||||
adjust: 5
|
||||
- action: DENY
|
||||
expression: userAgent.contains("SeoBot")
|
||||
name: robots-txt-policy-blacklist-12
|
||||
- action: CHALLENGE
|
||||
expression:
|
||||
all:
|
||||
- userAgent.contains("TestBot")
|
||||
- path.matches("^/.*/admin")
|
||||
name: robots-txt-policy-disallow-13
|
||||
- action: CHALLENGE
|
||||
expression:
|
||||
all:
|
||||
- userAgent.contains("TestBot")
|
||||
- path.matches("^/temp.*\\.html")
|
||||
name: robots-txt-policy-disallow-14
|
||||
- action: CHALLENGE
|
||||
expression:
|
||||
all:
|
||||
- userAgent.contains("TestBot")
|
||||
- path.matches("^/file.\\.log")
|
||||
name: robots-txt-policy-disallow-15
|
||||
6
cmd/robots2policy/testdata/custom-name.yaml
vendored
Normal file
6
cmd/robots2policy/testdata/custom-name.yaml
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
- action: CHALLENGE
|
||||
expression: path.startsWith("/admin/")
|
||||
name: my-custom-policy-disallow-1
|
||||
- action: CHALLENGE
|
||||
expression: path.startsWith("/private")
|
||||
name: my-custom-policy-disallow-2
|
||||
6
cmd/robots2policy/testdata/deny-action.yaml
vendored
Normal file
6
cmd/robots2policy/testdata/deny-action.yaml
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
- action: DENY
|
||||
expression: path.startsWith("/admin/")
|
||||
name: robots-txt-policy-disallow-1
|
||||
- action: DENY
|
||||
expression: path.startsWith("/private")
|
||||
name: robots-txt-policy-disallow-2
|
||||
2
cmd/robots2policy/testdata/empty.robots.txt
vendored
Normal file
2
cmd/robots2policy/testdata/empty.robots.txt
vendored
Normal file
@@ -0,0 +1,2 @@
|
||||
# Empty robots.txt (comments only)
|
||||
# No actual rules
|
||||
1
cmd/robots2policy/testdata/empty.yaml
vendored
Normal file
1
cmd/robots2policy/testdata/empty.yaml
vendored
Normal file
@@ -0,0 +1 @@
|
||||
[]
|
||||
12
cmd/robots2policy/testdata/simple.json
vendored
Normal file
12
cmd/robots2policy/testdata/simple.json
vendored
Normal file
@@ -0,0 +1,12 @@
|
||||
[
|
||||
{
|
||||
"action": "CHALLENGE",
|
||||
"expression": "path.startsWith(\"/admin/\")",
|
||||
"name": "robots-txt-policy-disallow-1"
|
||||
},
|
||||
{
|
||||
"action": "CHALLENGE",
|
||||
"expression": "path.startsWith(\"/private\")",
|
||||
"name": "robots-txt-policy-disallow-2"
|
||||
}
|
||||
]
|
||||
5
cmd/robots2policy/testdata/simple.robots.txt
vendored
Normal file
5
cmd/robots2policy/testdata/simple.robots.txt
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
# Simple robots.txt test
|
||||
User-agent: *
|
||||
Disallow: /admin/
|
||||
Disallow: /private
|
||||
Allow: /public
|
||||
6
cmd/robots2policy/testdata/simple.yaml
vendored
Normal file
6
cmd/robots2policy/testdata/simple.yaml
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
- action: CHALLENGE
|
||||
expression: path.startsWith("/admin/")
|
||||
name: robots-txt-policy-disallow-1
|
||||
- action: CHALLENGE
|
||||
expression: path.startsWith("/private")
|
||||
name: robots-txt-policy-disallow-2
|
||||
6
cmd/robots2policy/testdata/wildcards.robots.txt
vendored
Normal file
6
cmd/robots2policy/testdata/wildcards.robots.txt
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
# Test wildcard patterns
|
||||
User-agent: *
|
||||
Disallow: /search*
|
||||
Disallow: /*/private
|
||||
Disallow: /file?.txt
|
||||
Disallow: /admin/*?action=delete
|
||||
12
cmd/robots2policy/testdata/wildcards.yaml
vendored
Normal file
12
cmd/robots2policy/testdata/wildcards.yaml
vendored
Normal file
@@ -0,0 +1,12 @@
|
||||
- action: CHALLENGE
|
||||
expression: path.matches("^/search.*")
|
||||
name: robots-txt-policy-disallow-1
|
||||
- action: CHALLENGE
|
||||
expression: path.matches("^/.*/private")
|
||||
name: robots-txt-policy-disallow-2
|
||||
- action: CHALLENGE
|
||||
expression: path.matches("^/file.\\.txt")
|
||||
name: robots-txt-policy-disallow-3
|
||||
- action: CHALLENGE
|
||||
expression: path.matches("^/admin/.*.action=delete")
|
||||
name: robots-txt-policy-disallow-4
|
||||
20
data/apps/bookstack-saml.yaml
Normal file
20
data/apps/bookstack-saml.yaml
Normal file
@@ -0,0 +1,20 @@
|
||||
# Make SASL login work on bookstack with Anubis
|
||||
# https://www.bookstackapp.com/docs/admin/saml2-auth/
|
||||
- name: allow-bookstack-sasl-login-routes
|
||||
action: ALLOW
|
||||
expression:
|
||||
all:
|
||||
- 'method == "POST"'
|
||||
- path.startsWith("/saml2/acs")
|
||||
- name: allow-bookstack-sasl-metadata-routes
|
||||
action: ALLOW
|
||||
expression:
|
||||
all:
|
||||
- 'method == "GET"'
|
||||
- path.startsWith("/saml2/metadata")
|
||||
- name: allow-bookstack-sasl-logout-routes
|
||||
action: ALLOW
|
||||
expression:
|
||||
all:
|
||||
- 'method == "GET"'
|
||||
- path.startsWith("/saml2/sls")
|
||||
7
data/apps/qualys-ssl-labs.yml
Normal file
7
data/apps/qualys-ssl-labs.yml
Normal file
@@ -0,0 +1,7 @@
|
||||
# This policy allows Qualys SSL Labs to fully work. (https://www.ssllabs.com/ssltest)
|
||||
# IP ranges are taken from: https://qualys.my.site.com/discussions/s/article/000005823
|
||||
- name: qualys-ssl-labs
|
||||
action: ALLOW
|
||||
remote_addresses:
|
||||
- 64.41.200.0/24
|
||||
- 2600:C02:1020:4202::/64
|
||||
9
data/apps/searx-checker.yml
Normal file
9
data/apps/searx-checker.yml
Normal file
@@ -0,0 +1,9 @@
|
||||
# This policy allows SearXNG's instance tracker to work. (https://searx.space)
|
||||
# IPs are taken from `check.searx.space` DNS records.
|
||||
# https://toolbox.googleapps.com/apps/dig/#A/check.searx.space
|
||||
# https://toolbox.googleapps.com/apps/dig/#AAAA/check.searx.space
|
||||
- name: searx-checker
|
||||
action: ALLOW
|
||||
remote_addresses:
|
||||
- 167.235.158.251/32
|
||||
- 2a01:4f8:1c1c:8fc2::1/128
|
||||
@@ -4,7 +4,7 @@
|
||||
"import": "(data)/bots/_deny-pathological.yaml"
|
||||
},
|
||||
{
|
||||
"import": "(data)/bots/ai-robots-txt.yaml"
|
||||
"import": "(data)/meta/ai-block-aggressive.yaml"
|
||||
},
|
||||
{
|
||||
"import": "(data)/crawlers/_allow-good.yaml"
|
||||
|
||||
@@ -11,51 +11,190 @@
|
||||
## /usr/share/docs/anubis/data or in the tarball you extracted Anubis from.
|
||||
|
||||
bots:
|
||||
# Pathological bots to deny
|
||||
- # This correlates to data/bots/deny-pathological.yaml in the source tree
|
||||
# https://github.com/TecharoHQ/anubis/blob/main/data/bots/deny-pathological.yaml
|
||||
import: (data)/bots/_deny-pathological.yaml
|
||||
- import: (data)/bots/aggressive-brazilian-scrapers.yaml
|
||||
# Pathological bots to deny
|
||||
- # This correlates to data/bots/deny-pathological.yaml in the source tree
|
||||
# https://github.com/TecharoHQ/anubis/blob/main/data/bots/deny-pathological.yaml
|
||||
import: (data)/bots/_deny-pathological.yaml
|
||||
- import: (data)/bots/aggressive-brazilian-scrapers.yaml
|
||||
|
||||
# Enforce https://github.com/ai-robots-txt/ai.robots.txt
|
||||
- import: (data)/bots/ai-robots-txt.yaml
|
||||
# Aggressively block AI/LLM related bots/agents by default
|
||||
- import: (data)/meta/ai-block-aggressive.yaml
|
||||
|
||||
# Search engine crawlers to allow, defaults to:
|
||||
# - Google (so they don't try to bypass Anubis)
|
||||
# - Bing
|
||||
# - DuckDuckGo
|
||||
# - Qwant
|
||||
# - The Internet Archive
|
||||
# - Kagi
|
||||
# - Marginalia
|
||||
# - Mojeek
|
||||
- import: (data)/crawlers/_allow-good.yaml
|
||||
# Consider replacing the aggressive AI policy with more selective policies:
|
||||
# - import: (data)/meta/ai-block-moderate.yaml
|
||||
# - import: (data)/meta/ai-block-permissive.yaml
|
||||
|
||||
# Allow common "keeping the internet working" routes (well-known, favicon, robots.txt)
|
||||
- import: (data)/common/keep-internet-working.yaml
|
||||
# Search engine crawlers to allow, defaults to:
|
||||
# - Google (so they don't try to bypass Anubis)
|
||||
# - Apple
|
||||
# - Bing
|
||||
# - DuckDuckGo
|
||||
# - Qwant
|
||||
# - The Internet Archive
|
||||
# - Kagi
|
||||
# - Marginalia
|
||||
# - Mojeek
|
||||
- import: (data)/crawlers/_allow-good.yaml
|
||||
# Challenge Firefox AI previews
|
||||
- import: (data)/clients/x-firefox-ai.yaml
|
||||
|
||||
# # Punish any bot with "bot" in the user-agent string
|
||||
# # This is known to have a high false-positive rate, use at your own risk
|
||||
# - name: generic-bot-catchall
|
||||
# user_agent_regex: (?i:bot|crawler)
|
||||
# action: CHALLENGE
|
||||
# challenge:
|
||||
# difficulty: 16 # impossible
|
||||
# report_as: 4 # lie to the operator
|
||||
# algorithm: slow # intentionally waste CPU cycles and time
|
||||
# Allow common "keeping the internet working" routes (well-known, favicon, robots.txt)
|
||||
- import: (data)/common/keep-internet-working.yaml
|
||||
|
||||
# Generic catchall rule
|
||||
- name: generic-browser
|
||||
user_agent_regex: >-
|
||||
Mozilla|Opera
|
||||
action: CHALLENGE
|
||||
# # Punish any bot with "bot" in the user-agent string
|
||||
# # This is known to have a high false-positive rate, use at your own risk
|
||||
# - name: generic-bot-catchall
|
||||
# user_agent_regex: (?i:bot|crawler)
|
||||
# action: CHALLENGE
|
||||
# challenge:
|
||||
# difficulty: 16 # impossible
|
||||
# report_as: 4 # lie to the operator
|
||||
# algorithm: slow # intentionally waste CPU cycles and time
|
||||
|
||||
# Requires a subscription to Thoth to use, see
|
||||
# https://anubis.techaro.lol/docs/admin/thoth#geoip-based-filtering
|
||||
- name: countries-with-aggressive-scrapers
|
||||
action: WEIGH
|
||||
geoip:
|
||||
countries:
|
||||
- BR
|
||||
- CN
|
||||
weight:
|
||||
adjust: 10
|
||||
|
||||
# Requires a subscription to Thoth to use, see
|
||||
# https://anubis.techaro.lol/docs/admin/thoth#asn-based-filtering
|
||||
- name: aggressive-asns-without-functional-abuse-contact
|
||||
action: WEIGH
|
||||
asns:
|
||||
match:
|
||||
- 13335 # Cloudflare
|
||||
- 136907 # Huawei Cloud
|
||||
- 45102 # Alibaba Cloud
|
||||
weight:
|
||||
adjust: 10
|
||||
|
||||
# Generic catchall rule
|
||||
- name: generic-browser
|
||||
user_agent_regex: >-
|
||||
Mozilla|Opera
|
||||
action: WEIGH
|
||||
weight:
|
||||
adjust: 10
|
||||
|
||||
dnsbl: false
|
||||
|
||||
# #
|
||||
# impressum:
|
||||
# # Displayed at the bottom of every page rendered by Anubis.
|
||||
# footer: >-
|
||||
# This website is hosted by Zombocom. If you have any complaints or notes
|
||||
# about the service, please contact
|
||||
# <a href="mailto:contact@domainhere.example">contact@domainhere.example</a>
|
||||
# and we will assist you as soon as possible.
|
||||
|
||||
# # The imprint page that will be linked to at the footer of every Anubis page.
|
||||
# page:
|
||||
# # The HTML <title> of the page
|
||||
# title: Imprint and Privacy Policy
|
||||
# # The HTML contents of the page. The exact contents of this page can
|
||||
# # and will vary by locale. Please consult with a lawyer if you are not
|
||||
# # sure what to put here
|
||||
# body: >-
|
||||
# <p>Last updated: June 2025</p>
|
||||
|
||||
# <h2>Information that is gathered from visitors</h2>
|
||||
|
||||
# <p>In common with other websites, log files are stored on the web server saving details such as the visitor's IP address, browser type, referring page and time of visit.</p>
|
||||
|
||||
# <p>Cookies may be used to remember visitor preferences when interacting with the website.</p>
|
||||
|
||||
# <p>Where registration is required, the visitor's email and a username will be stored on the server.</p>
|
||||
|
||||
# <!-- ... -->
|
||||
|
||||
# Open Graph passthrough configuration, see here for more information:
|
||||
# https://anubis.techaro.lol/docs/admin/configuration/open-graph/
|
||||
openGraph:
|
||||
# Enables Open Graph passthrough
|
||||
enabled: false
|
||||
# Enables the use of the HTTP host in the cache key, this enables
|
||||
# caching metadata for multiple http hosts at once.
|
||||
considerHost: false
|
||||
# How long cached OpenGraph metadata should last in memory
|
||||
ttl: 24h
|
||||
# # If set, return these opengraph values instead of looking them up with
|
||||
# # the target service.
|
||||
# #
|
||||
# # Correlates to properties in https://ogp.me/
|
||||
# override:
|
||||
# # og:title is required, it is the title of the website
|
||||
# "og:title": "Techaro Anubis"
|
||||
# "og:description": >-
|
||||
# Anubis is a Web AI Firewall Utility that helps you fight the bots
|
||||
# away so that you can maintain uptime at work!
|
||||
# "description": >-
|
||||
# Anubis is a Web AI Firewall Utility that helps you fight the bots
|
||||
# away so that you can maintain uptime at work!
|
||||
|
||||
# By default, send HTTP 200 back to clients that either get issued a challenge
|
||||
# or a denial. This seems weird, but this is load-bearing due to the fact that
|
||||
# the most aggressive scraper bots seem to really, really, want an HTTP 200 and
|
||||
# will stop sending requests once they get it.
|
||||
status_codes:
|
||||
CHALLENGE: 200
|
||||
DENY: 200
|
||||
DENY: 200
|
||||
|
||||
# The weight thresholds for when to trigger individual challenges. Any
|
||||
# CHALLENGE will take precedence over this.
|
||||
#
|
||||
# A threshold has four configuration options:
|
||||
#
|
||||
# - name: the name that is reported down the stack and used for metrics
|
||||
# - expression: A CEL expression with the request weight in the variable
|
||||
# weight
|
||||
# - action: the Anubis action to apply, similar to in a bot policy
|
||||
# - challenge: which challenge to send to the user, similar to in a bot policy
|
||||
#
|
||||
# See https://anubis.techaro.lol/docs/admin/configuration/thresholds for more
|
||||
# information.
|
||||
thresholds:
|
||||
# By default Anubis ships with the following thresholds:
|
||||
- name: minimal-suspicion # This client is likely fine, its soul is lighter than a feather
|
||||
expression: weight <= 0 # a feather weighs zero units
|
||||
action: ALLOW # Allow the traffic through
|
||||
# For clients that had some weight reduced through custom rules, give them a
|
||||
# lightweight challenge.
|
||||
- name: mild-suspicion
|
||||
expression:
|
||||
all:
|
||||
- weight > 0
|
||||
- weight < 10
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
# https://anubis.techaro.lol/docs/admin/configuration/challenges/metarefresh
|
||||
algorithm: metarefresh
|
||||
difficulty: 1
|
||||
report_as: 1
|
||||
# For clients that are browser-like but have either gained points from custom rules or
|
||||
# report as a standard browser.
|
||||
- name: moderate-suspicion
|
||||
expression:
|
||||
all:
|
||||
- weight >= 10
|
||||
- weight < 20
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
# https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
|
||||
algorithm: fast
|
||||
difficulty: 2 # two leading zeros, very fast for most clients
|
||||
report_as: 2
|
||||
# For clients that are browser like and have gained many points from custom rules
|
||||
- name: extreme-suspicion
|
||||
expression: weight >= 20
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
# https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
|
||||
algorithm: fast
|
||||
difficulty: 4
|
||||
report_as: 4
|
||||
|
||||
@@ -1,28 +1,26 @@
|
||||
- name: deny-aggressive-brazilian-scrapers
|
||||
action: DENY
|
||||
action: WEIGH
|
||||
weight:
|
||||
adjust: 20
|
||||
expression:
|
||||
any:
|
||||
# Internet Explorer should be out of support
|
||||
- userAgent.contains("MSIE")
|
||||
# Trident is the Internet Explorer browser engine
|
||||
- userAgent.contains("Trident")
|
||||
# Opera is a fork of chrome now
|
||||
- userAgent.contains("Presto")
|
||||
# Windows CE is discontinued
|
||||
- userAgent.contains("Windows CE")
|
||||
# Windows 95 is discontinued
|
||||
- userAgent.contains("Windows 95")
|
||||
# Windows 98 is discontinued
|
||||
- userAgent.contains("Windows 98")
|
||||
# Windows 9.x is discontinued
|
||||
- userAgent.contains("Win 9x")
|
||||
# Amazon does not have an Alexa Toolbar.
|
||||
- userAgent.contains("Alexa Toolbar")
|
||||
- name: challenge-aggressive-brazilian-scrapers
|
||||
action: CHALLENGE
|
||||
expression:
|
||||
any:
|
||||
# This is not released, even Windows 11 calls itself Windows 10
|
||||
- userAgent.contains("Windows NT 11.0")
|
||||
# iPods are not in common use
|
||||
- userAgent.contains("iPod")
|
||||
# Internet Explorer should be out of support
|
||||
- userAgent.contains("MSIE")
|
||||
# Trident is the Internet Explorer browser engine
|
||||
- userAgent.contains("Trident")
|
||||
# Opera is a fork of chrome now
|
||||
- userAgent.contains("Presto")
|
||||
# Windows CE is discontinued
|
||||
- userAgent.contains("Windows CE")
|
||||
# Windows 95 is discontinued
|
||||
- userAgent.contains("Windows 95")
|
||||
# Windows 98 is discontinued
|
||||
- userAgent.contains("Windows 98")
|
||||
# Windows 9.x is discontinued
|
||||
- userAgent.contains("Win 9x")
|
||||
# Amazon does not have an Alexa Toolbar.
|
||||
- userAgent.contains("Alexa Toolbar")
|
||||
# This is not released, even Windows 11 calls itself Windows 10
|
||||
- userAgent.contains("Windows NT 11.0")
|
||||
# iPods are not in common use
|
||||
- userAgent.contains("iPod")
|
||||
|
||||
11
data/bots/ai-catchall.yaml
Normal file
11
data/bots/ai-catchall.yaml
Normal file
@@ -0,0 +1,11 @@
|
||||
# Extensive list of AI-affiliated agents based on https://github.com/ai-robots-txt/ai.robots.txt
|
||||
# Add new/undocumented agents here. Where documentation exists, consider moving to dedicated policy files.
|
||||
# Notes on various agents:
|
||||
# - Amazonbot: Well documented, but they refuse to state which agent collects training data.
|
||||
# - anthropic-ai/Claude-Web: Undocumented by Anthropic. Possibly deprecated or hallucinations?
|
||||
# - Perplexity*: Well documented, but they refuse to state which agent collects training data.
|
||||
# Warning: May contain user agents that _must_ be blocked in robots.txt, or the opt-out will have no effect.
|
||||
- name: "ai-catchall"
|
||||
user_agent_regex: >-
|
||||
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|anthropic-ai|Brightbot 1.0|Bytespider|CCBot|Claude-Web|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-CloudVertexBot|GoogleOther|GoogleOther-Image|GoogleOther-Video|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|NovaAct|omgili|omgilibot|Operator|PanguBot|Perplexity-User|PerplexityBot|PetalBot|QualifiedBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|wpbot|YouBot
|
||||
action: DENY
|
||||
@@ -1,4 +1,6 @@
|
||||
# Warning: Contains user agents that _must_ be blocked in robots.txt, or the opt-out will have no effect.
|
||||
# Note: Blocks human-directed/non-training user agents
|
||||
- name: "ai-robots-txt"
|
||||
user_agent_regex: >-
|
||||
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Brightbot 1.0|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|NovaAct|OAI-SearchBot|omgili|omgilibot|Operator|PanguBot|Perplexity-User|PerplexityBot|PetalBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot
|
||||
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|Andibot|anthropic-ai|Applebot|Applebot-Extended|bedrockbot|Brightbot 1.0|Bytespider|CCBot|ChatGPT-User|Claude-SearchBot|Claude-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|EchoboxBot|FacebookBot|facebookexternalhit|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-CloudVertexBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|MistralAI-User/1.0|MyCentralAIScraperBot|NovaAct|OAI-SearchBot|omgili|omgilibot|Operator|PanguBot|Panscient|panscient.com|Perplexity-User|PerplexityBot|PetalBot|PhindBot|Poseidon Research Crawler|QualifiedBot|QuillBot|quillbot.com|SBIntuitionsBot|Scrapy|SemrushBot|SemrushBot-BA|SemrushBot-CT|SemrushBot-OCOB|SemrushBot-SI|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|wpbot|YandexAdditional|YandexAdditionalBot|YouBot
|
||||
action: DENY
|
||||
|
||||
@@ -1,4 +1,6 @@
|
||||
- name: cloudflare-workers
|
||||
headers_regex:
|
||||
CF-Worker: .*
|
||||
action: DENY
|
||||
action: WEIGH
|
||||
weight:
|
||||
adjust: 15
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
action: ALLOW
|
||||
expression:
|
||||
all:
|
||||
- remoteAddress == "159.69.213.214"
|
||||
- remoteAddress == "159.69.213.214" || remoteAddress == "2a01:4f8:c2c:7bf4::1"
|
||||
- userAgent == "Mozilla/5.0 (compatible; utils.web Limnoria module)"
|
||||
- '"X-Http-Version" in headers'
|
||||
- headers["X-Http-Version"] == "HTTP/1.1"
|
||||
8
data/clients/ai.yaml
Normal file
8
data/clients/ai.yaml
Normal file
@@ -0,0 +1,8 @@
|
||||
# User agents that act on behalf of humans in AI tools, e.g. searching the web.
|
||||
# Each entry should have a positive/ALLOW entry created as well, with further documentation.
|
||||
# Exceptions:
|
||||
# - Claude-User: No published IP allowlist
|
||||
- name: "ai-clients"
|
||||
user_agent_regex: >-
|
||||
ChatGPT-User|Claude-User|MistralAI-User
|
||||
action: DENY
|
||||
10
data/clients/mistral-mistralai-user.yaml
Normal file
10
data/clients/mistral-mistralai-user.yaml
Normal file
@@ -0,0 +1,10 @@
|
||||
# Acts on behalf of user requests
|
||||
# https://docs.mistral.ai/robots/
|
||||
- name: mistral-mistralai-user
|
||||
user_agent_regex: MistralAI-User/.+; \+https\://docs\.mistral\.ai/robots
|
||||
action: ALLOW
|
||||
# https://mistral.ai/mistralai-user-ips.json
|
||||
remote_addresses: [
|
||||
"20.240.160.161/32",
|
||||
"20.240.160.1/32",
|
||||
]
|
||||
93
data/clients/openai-chatgpt-user.yaml
Normal file
93
data/clients/openai-chatgpt-user.yaml
Normal file
@@ -0,0 +1,93 @@
|
||||
# Acts on behalf of user requests
|
||||
# https://platform.openai.com/docs/bots/overview-of-openai-crawlers
|
||||
- name: openai-chatgpt-user
|
||||
user_agent_regex: ChatGPT-User/.+; \+https\://openai\.com/bot
|
||||
action: ALLOW
|
||||
# https://openai.com/chatgpt-user.json
|
||||
# curl 'https://openai.com/chatgpt-user.json' | jq '.prefixes.[].ipv4Prefix' | sed 's/$/,/'
|
||||
remote_addresses: [
|
||||
"13.65.138.112/28",
|
||||
"23.98.179.16/28",
|
||||
"13.65.138.96/28",
|
||||
"172.183.222.128/28",
|
||||
"20.102.212.144/28",
|
||||
"40.116.73.208/28",
|
||||
"172.183.143.224/28",
|
||||
"52.190.190.16/28",
|
||||
"13.83.237.176/28",
|
||||
"51.8.155.64/28",
|
||||
"74.249.86.176/28",
|
||||
"51.8.155.48/28",
|
||||
"20.55.229.144/28",
|
||||
"135.237.131.208/28",
|
||||
"135.237.133.48/28",
|
||||
"51.8.155.112/28",
|
||||
"135.237.133.112/28",
|
||||
"52.159.249.96/28",
|
||||
"52.190.137.16/28",
|
||||
"52.255.111.112/28",
|
||||
"40.84.181.32/28",
|
||||
"172.178.141.112/28",
|
||||
"52.190.142.64/28",
|
||||
"172.178.140.144/28",
|
||||
"52.190.137.144/28",
|
||||
"172.178.141.128/28",
|
||||
"57.154.187.32/28",
|
||||
"4.196.118.112/28",
|
||||
"20.193.50.32/28",
|
||||
"20.215.188.192/28",
|
||||
"20.215.214.16/28",
|
||||
"4.197.22.112/28",
|
||||
"4.197.115.112/28",
|
||||
"172.213.21.16/28",
|
||||
"172.213.11.144/28",
|
||||
"172.213.12.112/28",
|
||||
"172.213.21.144/28",
|
||||
"20.90.7.144/28",
|
||||
"57.154.175.0/28",
|
||||
"57.154.174.112/28",
|
||||
"52.236.94.144/28",
|
||||
"137.135.191.176/28",
|
||||
"23.98.186.192/28",
|
||||
"23.98.186.96/28",
|
||||
"23.98.186.176/28",
|
||||
"23.98.186.64/28",
|
||||
"68.221.67.192/28",
|
||||
"68.221.67.160/28",
|
||||
"13.83.167.128/28",
|
||||
"20.228.106.176/28",
|
||||
"52.159.227.32/28",
|
||||
"68.220.57.64/28",
|
||||
"172.213.21.112/28",
|
||||
"68.221.67.224/28",
|
||||
"68.221.75.16/28",
|
||||
"20.97.189.96/28",
|
||||
"52.252.113.240/28",
|
||||
"52.230.163.32/28",
|
||||
"172.212.159.64/28",
|
||||
"52.255.111.80/28",
|
||||
"52.255.111.0/28",
|
||||
"4.151.241.240/28",
|
||||
"52.255.111.32/28",
|
||||
"52.255.111.48/28",
|
||||
"52.255.111.16/28",
|
||||
"52.230.164.176/28",
|
||||
"52.176.139.176/28",
|
||||
"52.173.234.16/28",
|
||||
"4.151.71.176/28",
|
||||
"4.151.119.48/28",
|
||||
"52.255.109.112/28",
|
||||
"52.255.109.80/28",
|
||||
"20.161.75.208/28",
|
||||
"68.154.28.96/28",
|
||||
"52.255.109.128/28",
|
||||
"52.225.75.208/28",
|
||||
"52.190.139.48/28",
|
||||
"68.221.67.240/28",
|
||||
"52.156.77.144/28",
|
||||
"52.148.129.32/28",
|
||||
"40.84.221.208/28",
|
||||
"104.210.139.224/28",
|
||||
"40.84.221.224/28",
|
||||
"104.210.139.192/28",
|
||||
]
|
||||
2
data/clients/small-internet-browsers/_permissive.yaml
Normal file
2
data/clients/small-internet-browsers/_permissive.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
- import: (data)/clients/small-internet-browsers/netsurf.yaml
|
||||
- import: (data)/clients/small-internet-browsers/palemoon.yaml
|
||||
5
data/clients/small-internet-browsers/netsurf.yaml
Normal file
5
data/clients/small-internet-browsers/netsurf.yaml
Normal file
@@ -0,0 +1,5 @@
|
||||
- name: "reduce-weight-netsurf"
|
||||
user_agent_regex: "NetSurf"
|
||||
action: WEIGH
|
||||
weight:
|
||||
adjust: -5
|
||||
5
data/clients/small-internet-browsers/palemoon.yaml
Normal file
5
data/clients/small-internet-browsers/palemoon.yaml
Normal file
@@ -0,0 +1,5 @@
|
||||
- name: "reduce-weight-palemoon"
|
||||
user_agent_regex: "PaleMoon"
|
||||
action: WEIGH
|
||||
weight:
|
||||
adjust: -5
|
||||
6
data/clients/x-firefox-ai.yaml
Normal file
6
data/clients/x-firefox-ai.yaml
Normal file
@@ -0,0 +1,6 @@
|
||||
# https://connect.mozilla.org/t5/firefox-labs/try-out-link-previews-in-firefox-labs-138-and-share-your/td-p/92012
|
||||
- name: x-firefox-ai
|
||||
action: WEIGH
|
||||
expression: '"X-Firefox-Ai" in headers'
|
||||
weight:
|
||||
adjust: 5
|
||||
@@ -1,15 +1,15 @@
|
||||
- name: ipv4-rfc-1918
|
||||
action: ALLOW
|
||||
remote_addresses:
|
||||
- 10.0.0.0/8
|
||||
- 172.16.0.0/12
|
||||
- 192.168.0.0/16
|
||||
- 100.64.0.0/10
|
||||
- 10.0.0.0/8
|
||||
- 172.16.0.0/12
|
||||
- 192.168.0.0/16
|
||||
- 100.64.0.0/10
|
||||
- name: ipv6-ula
|
||||
action: ALLOW
|
||||
remote_addresses:
|
||||
- fc00::/7
|
||||
- fc00::/7
|
||||
- name: ipv6-link-local
|
||||
action: ALLOW
|
||||
remote_addresses:
|
||||
- fe80::/10
|
||||
- fe80::/10
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
- import: (data)/crawlers/googlebot.yaml
|
||||
- import: (data)/crawlers/applebot.yaml
|
||||
- import: (data)/crawlers/bingbot.yaml
|
||||
- import: (data)/crawlers/duckduckbot.yaml
|
||||
- import: (data)/crawlers/qwantbot.yaml
|
||||
|
||||
8
data/crawlers/ai-search.yaml
Normal file
8
data/crawlers/ai-search.yaml
Normal file
@@ -0,0 +1,8 @@
|
||||
# User agents that index exclusively for search in for AI systems.
|
||||
# Each entry should have a positive/ALLOW entry created as well, with further documentation.
|
||||
# Exceptions:
|
||||
# - Claude-SearchBot: No published IP allowlist
|
||||
- name: "ai-crawlers-search"
|
||||
user_agent_regex: >-
|
||||
OAI-SearchBot|Claude-SearchBot
|
||||
action: DENY
|
||||
8
data/crawlers/ai-training.yaml
Normal file
8
data/crawlers/ai-training.yaml
Normal file
@@ -0,0 +1,8 @@
|
||||
# User agents that crawl for training AI/LLM systems
|
||||
# Each entry should have a positive/ALLOW entry created as well, with further documentation.
|
||||
# Exceptions:
|
||||
# - ClaudeBot: No published IP allowlist
|
||||
- name: "ai-crawlers-training"
|
||||
user_agent_regex: >-
|
||||
GPTBot|ClaudeBot
|
||||
action: DENY
|
||||
20
data/crawlers/applebot.yaml
Normal file
20
data/crawlers/applebot.yaml
Normal file
@@ -0,0 +1,20 @@
|
||||
# Indexing for search and Siri
|
||||
# https://support.apple.com/en-us/119829
|
||||
- name: applebot
|
||||
user_agent_regex: Applebot
|
||||
action: ALLOW
|
||||
# https://search.developer.apple.com/applebot.json
|
||||
remote_addresses: [
|
||||
"17.241.208.160/27",
|
||||
"17.241.193.160/27",
|
||||
"17.241.200.160/27",
|
||||
"17.22.237.0/24",
|
||||
"17.22.245.0/24",
|
||||
"17.22.253.0/24",
|
||||
"17.241.75.0/24",
|
||||
"17.241.219.0/24",
|
||||
"17.241.227.0/24",
|
||||
"17.246.15.0/24",
|
||||
"17.246.19.0/24",
|
||||
"17.246.23.0/24",
|
||||
]
|
||||
16
data/crawlers/openai-gptbot.yaml
Normal file
16
data/crawlers/openai-gptbot.yaml
Normal file
@@ -0,0 +1,16 @@
|
||||
# Collects AI training data
|
||||
# https://platform.openai.com/docs/bots/overview-of-openai-crawlers
|
||||
- name: openai-gptbot
|
||||
user_agent_regex: GPTBot/1\.1; \+https\://openai\.com/gptbot
|
||||
action: ALLOW
|
||||
# https://openai.com/gptbot.json
|
||||
remote_addresses: [
|
||||
"52.230.152.0/24",
|
||||
"20.171.206.0/24",
|
||||
"20.171.207.0/24",
|
||||
"4.227.36.0/25",
|
||||
"20.125.66.80/28",
|
||||
"172.182.204.0/24",
|
||||
"172.182.214.0/24",
|
||||
"172.182.215.0/24",
|
||||
]
|
||||
13
data/crawlers/openai-searchbot.yaml
Normal file
13
data/crawlers/openai-searchbot.yaml
Normal file
@@ -0,0 +1,13 @@
|
||||
# Indexing for search, does not collect training data
|
||||
# https://platform.openai.com/docs/bots/overview-of-openai-crawlers
|
||||
- name: openai-searchbot
|
||||
user_agent_regex: OAI-SearchBot/1\.0; \+https\://openai\.com/searchbot
|
||||
action: ALLOW
|
||||
# https://openai.com/searchbot.json
|
||||
remote_addresses: [
|
||||
"20.42.10.176/28",
|
||||
"172.203.190.128/28",
|
||||
"104.210.140.128/28",
|
||||
"51.8.102.0/24",
|
||||
"135.234.64.0/24"
|
||||
]
|
||||
@@ -3,6 +3,6 @@ package data
|
||||
import "embed"
|
||||
|
||||
var (
|
||||
//go:embed botPolicies.yaml botPolicies.json all:apps all:bots all:clients all:common all:crawlers
|
||||
//go:embed botPolicies.yaml botPolicies.json all:apps all:bots all:clients all:common all:crawlers all:meta
|
||||
BotPolicies embed.FS
|
||||
)
|
||||
|
||||
5
data/meta/README.md
Normal file
5
data/meta/README.md
Normal file
@@ -0,0 +1,5 @@
|
||||
# meta policies
|
||||
|
||||
Contains policies that exclusively reference policies in _multiple_ other data folders.
|
||||
|
||||
Akin to "stances" that the administrator can take, with reference to various topics, such as AI/LLM systems.
|
||||
6
data/meta/ai-block-aggressive.yaml
Normal file
6
data/meta/ai-block-aggressive.yaml
Normal file
@@ -0,0 +1,6 @@
|
||||
# Blocks all AI/LLM associated user agents, regardless of purpose or human agency
|
||||
# Warning: To completely block some AI/LLM training, such as with Google, you _must_ place flags in robots.txt.
|
||||
- import: (data)/bots/ai-catchall.yaml
|
||||
- import: (data)/clients/ai.yaml
|
||||
- import: (data)/crawlers/ai-search.yaml
|
||||
- import: (data)/crawlers/ai-training.yaml
|
||||
7
data/meta/ai-block-moderate.yaml
Normal file
7
data/meta/ai-block-moderate.yaml
Normal file
@@ -0,0 +1,7 @@
|
||||
# Blocks all AI/LLM bots used for training or unknown/undocumented purposes.
|
||||
# Permits user agents with explicitly documented non-training use, and published IP allowlists.
|
||||
- import: (data)/bots/ai-catchall.yaml
|
||||
- import: (data)/crawlers/ai-training.yaml
|
||||
- import: (data)/crawlers/openai-searchbot.yaml
|
||||
- import: (data)/clients/openai-chatgpt-user.yaml
|
||||
- import: (data)/clients/mistral-mistralai-user.yaml
|
||||
6
data/meta/ai-block-permissive.yaml
Normal file
6
data/meta/ai-block-permissive.yaml
Normal file
@@ -0,0 +1,6 @@
|
||||
# Permits all well documented AI/LLM user agents with published IP allowlists.
|
||||
- import: (data)/bots/ai-catchall.yaml
|
||||
- import: (data)/crawlers/openai-searchbot.yaml
|
||||
- import: (data)/crawlers/openai-gptbot.yaml
|
||||
- import: (data)/clients/openai-chatgpt-user.yaml
|
||||
- import: (data)/clients/mistral-mistralai-user.yaml
|
||||
14
docs/blog/2025-06-16-welcome/index.mdx
Normal file
14
docs/blog/2025-06-16-welcome/index.mdx
Normal file
@@ -0,0 +1,14 @@
|
||||
---
|
||||
slug: welcome
|
||||
title: Welcome to the Anubis blog!
|
||||
authors: [xe]
|
||||
tags: [intro]
|
||||
---
|
||||
|
||||
Hello, world!
|
||||
|
||||
At Techaro, we've been working on making Anubis even better, and in the process we want to share what we've done, how it works, and signal boost cool things the community has done. As things happen, we'll blog about them so that you can learn from our struggles.
|
||||
|
||||
More details to come soon!
|
||||
|
||||
{/* truncate */}
|
||||
248
docs/blog/2025-06-27-release-1.20.0/index.mdx
Normal file
248
docs/blog/2025-06-27-release-1.20.0/index.mdx
Normal file
@@ -0,0 +1,248 @@
|
||||
---
|
||||
slug: release/v1.20.0
|
||||
title: Anubis v1.20.0 is now available!
|
||||
authors: [xe]
|
||||
tags: [release]
|
||||
image: sunburst.webp
|
||||
---
|
||||
|
||||

|
||||
|
||||
Hey all!
|
||||
|
||||
Today we released [Anubis v1.20.0: Thancred Waters](https://github.com/TecharoHQ/anubis/releases/tag/v1.20.0). This adds a lot of new and exciting features to Anubis, including but not limited to the `WEIGH` action, custom weight thresholds, Imprint/impressum support, and a no-JS challenge. Here's what you need to know so you can protect your websites in new and exciting ways!
|
||||
|
||||
{/* truncate */}
|
||||
|
||||
## Sponsoring the product
|
||||
|
||||
If you rely on Anubis to keep your website safe, please consider sponsoring the project on [GitHub Sponsors](https://github.com/sponsors/Xe) or [Patreon](https://patreon.com/cadey). Funding helps pay hosting bills and offset the time spent on making this project the best it can be. Every little bit helps and when enough money is raised, [I can make Anubis my full-time job](https://github.com/TecharoHQ/anubis/discussions/278).
|
||||
|
||||
I am waiting to hear back from NLNet on if Anubis was selected for funding or not. Let's hope it is!
|
||||
|
||||
## Deprecation warning: `DEFAULT_DIFFICULTY`
|
||||
|
||||
Anubis v1.20.0 is the last version to support the `DEFAULT_DIFFICULTY` flag in the exact way it currently does. In future versions, this will be ineffectual and you should use the [custom threshold system](/docs/admin/configuration/thresholds) instead.
|
||||
|
||||
If this becomes an imposition in practice, this will be reverted.
|
||||
|
||||
## Chrome won't show "invalid response" after "Success!"
|
||||
|
||||
There were a bunch of smaller fixes in Anubis this time around, but the biggest one was finally squashing the ["invalid response" after "Success!" issue](https://github.com/TecharoHQ/anubis/issues/564) that had been plaguing Chrome users. This was a really annoying issue to track down but it was discovered while we were working on better end-to-end / functional testing: [Chrome randomizes the `Accept-Language` header](https://github.com/explainers-by-googlers/reduce-accept-language) so that websites can't do fingerprinting as easily.
|
||||
|
||||
When Anubis issues a challenge, it grabs [information that the browser sends to the user](/docs/design/how-anubis-works#challenge-format) to create a challenge string. Anubis doesn't store these challenge strings anywhere, and when a solution is being checked it calculates the challenge string from the request. This means that they'd get a challenge on one end, compute the response for that challenge, and then the server would validate that against a different challenge. This server-side validation would fail, leading to the user seeing "invalid response" after the client reported success.
|
||||
|
||||
I suspect this was why Vanadium and Cromite were having sporadic issues as well.
|
||||
|
||||
## New Features
|
||||
|
||||
The biggest feature in Anubis is the "weight" subsystem. This allows administrators to make custom rules that change the suspicion level of a request without having to take immediate action. As an example, consider the self-hostable git forge [Gitea](https://about.gitea.com/). When you load a page in Gitea, it creates a session cookie that your browser sends with every request. Weight allows you to mark a request that includes a Gitea session token as _less_ suspicious:
|
||||
|
||||
```yaml
|
||||
- name: gitea-session-token
|
||||
action: WEIGH
|
||||
expression:
|
||||
all:
|
||||
# Check if the request has a Cookie header
|
||||
- '"Cookie" in headers'
|
||||
# Check if the request's Cookie header contains the Gitea session token
|
||||
- headers["Cookie"].contains("i_love_gitea=")
|
||||
# Remove 5 weight points
|
||||
weight:
|
||||
adjust: -5
|
||||
```
|
||||
|
||||
This is different from the past where you could only allow every request with a Gitea session token, meaning that the invention of lying would allow malicious clients to bypass protection.
|
||||
|
||||
Weight is added and removed whenever a `WEIGH` rule is encountered. When all rules are processed and the request doesn't match any `ALLOW`, `CHALLENGE`, or `DENY` rules, Anubis uses [weight thresholds](/docs/admin/configuration/thresholds) to figure out how to handle that request. Thresholds are defined in the [policy file](/docs/admin/policies) alongside your bot rules:
|
||||
|
||||
```yaml
|
||||
thresholds:
|
||||
- name: minimal-suspicion # This client is likely fine, its soul is lighter than a feather
|
||||
expression: weight <= 0 # a feather weighs zero units
|
||||
action: ALLOW # Allow the traffic through
|
||||
# For clients that had some weight reduced through custom rules, give them a
|
||||
# lightweight challenge.
|
||||
- name: mild-suspicion
|
||||
expression:
|
||||
all:
|
||||
- weight > 0
|
||||
- weight < 10
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
# https://anubis.techaro.lol/docs/admin/configuration/challenges/metarefresh
|
||||
algorithm: metarefresh
|
||||
difficulty: 1
|
||||
report_as: 1
|
||||
# For clients that are browser-like but have either gained points from custom rules or
|
||||
# report as a standard browser.
|
||||
- name: moderate-suspicion
|
||||
expression:
|
||||
all:
|
||||
- weight >= 10
|
||||
- weight < 20
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
# https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
|
||||
algorithm: fast
|
||||
difficulty: 2 # two leading zeros, very fast for most clients
|
||||
report_as: 2
|
||||
# For clients that are browser like and have gained many points from custom rules
|
||||
- name: extreme-suspicion
|
||||
expression: weight >= 20
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
# https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
|
||||
algorithm: fast
|
||||
difficulty: 4
|
||||
report_as: 4
|
||||
```
|
||||
|
||||
:::note
|
||||
|
||||
If you don't have thresholds defined in your Anubis policy file, Anubis will default to the "legacy" behaviour where browser-like clients get a challenge at the default difficulty.
|
||||
|
||||
:::
|
||||
|
||||
This lets most clients through if they pass a simple [proof of work challenge](/docs/admin/configuration/challenges/proof-of-work), but any clients that are less suspicious (like ones with a Gitea session token) are given the lightweight [Meta Refresh](/docs/admin/configuration/challenges/metarefresh) challenge instead.
|
||||
|
||||
Threshold expressions are like [Bot rule expressions](/docs/admin/configuration/expressions), but there's only one input: the request's weight. If no thresholds match, the request is allowed through.
|
||||
|
||||
### Imprint/Impressum Support
|
||||
|
||||
European countries like Germany [require an imprint/impressum](https://www.ionos.com/digitalguide/websites/digital-law/a-case-for-thinking-global-germanys-impressum-laws/) to be present in the footer of their website. This allows users to contact someone on the team behind a website in case they run into issues. This also must generally have a separate page where users can view an extended imprint with other information like a privacy policy or a copyright notice.
|
||||
|
||||
Anubis v1.20.0 and later [has support for showing imprints](/docs/admin/configuration/impressum). You can configure two kinds of imprints:
|
||||
|
||||
1. An imprint that is shown in the footer of every Anubis page.
|
||||
2. An extended imprint / privacy policy that is shown when users click on the "Imprint" link. For example, [here's the imprint for the website you're looking at right now](https://anubis.techaro.lol/.within.website/x/cmd/anubis/api/imprint).
|
||||
|
||||
Imprints are configured in [the policy file](/docs/admin/policies/):
|
||||
|
||||
```yaml
|
||||
impressum:
|
||||
# Displayed at the bottom of every page rendered by Anubis.
|
||||
footer: >-
|
||||
This website is hosted by Zombocom. If you have any complaints or notes
|
||||
about the service, please contact
|
||||
<a href="mailto:contact@zombocom.example">contact@zombocom.example</a> and
|
||||
we will assist you as soon as possible.
|
||||
|
||||
# The imprint page that will be linked to at the footer of every Anubis page.
|
||||
page:
|
||||
# The HTML <title> of the page
|
||||
title: Imprint and Privacy Policy
|
||||
# The HTML contents of the page. The exact contents of this page can
|
||||
# and will vary by locale. Please consult with a lawyer if you are not
|
||||
# sure what to put here.
|
||||
body: >-
|
||||
<p>Last updated: June 2025</p>
|
||||
|
||||
<h2>Information that is gathered from visitors</h2>
|
||||
|
||||
<p>In common with other websites, log files are stored on the web server
|
||||
saving details such as the visitor's IP address, browser type, referring
|
||||
page and time of visit.</p>
|
||||
|
||||
<p>Cookies may be used to remember visitor preferences when interacting
|
||||
with the website.</p>
|
||||
|
||||
<p>Where registration is required, the visitor's email and a username
|
||||
will be stored on the server.</p>
|
||||
|
||||
<!-- ... -->
|
||||
```
|
||||
|
||||
If this is insufficient, please [file an issue](https://github.com/TecharoHQ/anubis/issues/new) with a link to the relevant legislation for your country so that this feature can be amended and improved.
|
||||
|
||||
### No-JS Challenge
|
||||
|
||||
One of the first issues in Anubis before it was moved to the [TecharoHQ org](https://github.com/TecharoHQ) was a request [to support challenging browsers without using JavaScript](https://github.com/Xe/x/issues/651). This is a pretty challenging thing to do without rethinking how Anubis works from a fundamentally low level, and with v1.20.0, [Anubis finally has support for running without client-side JavaScript](https://github.com/TecharoHQ/anubis/issues/95) thanks to the [Meta Refresh](/docs/admin/configuration/challenges/metarefresh) challenge.
|
||||
|
||||
When Anubis decides it needs to send a challenge to your browser, it sends a challenge page. Historically, this challenge page is [an HTML template](https://github.com/TecharoHQ/anubis/blob/main/web/index.templ) that kicks off some JavaScript, reads the challenge information out of the page body, and then solves it as fast as possible in order to let users see the website they want to visit.
|
||||
|
||||
In v1.20.0, Anubis has a challenge registry to hold [different client challenge implementations](/docs/category/challenges). This allows us to implement anything we want as long as it can render a page to show a challenge and then check if the result is correct. This is going to be used to implement a WebAssembly-based proof of work option (one that will be way more efficient than the existing browser JS version), but as a proof of concept I implemented a simple challenge using [HTML `<meta refresh>`](https://en.wikipedia.org/wiki/Meta_refresh).
|
||||
|
||||
In my testing, this has worked with every browser I have thrown it at (including CLI browsers, the browser embedded in emacs, etc.). The default configuration of Anubis does use the [meta refresh challenge](/docs/admin/configuration/challenges/metarefresh) for [clients with a very low suspicion](/docs/admin/configuration/thresholds), but by default clients will be sent an [easy proof of work challenge](/docs/admin/configuration/challenges/proof-of-work).
|
||||
|
||||
If the false positive rate of this challenge turns out to not be very high in practice, the meta refresh challenge will be enabled by default for browsers in future versions of Anubis.
|
||||
|
||||
### `robots2policy`
|
||||
|
||||
Anubis was created because crawler bots don't respect [`robots.txt` files](https://www.robotstxt.org/). Administrators have been working on refining and crafting their `robots.txt` files for years, and one common comment is that people don't know where to start crafting their own rules.
|
||||
|
||||
Anubis now ships with a [`robots2policy` tool](/docs/admin/robots2policy) that lets you convert your `robots.txt` file to an Anubis policy.
|
||||
|
||||
```text
|
||||
robots2policy -input https://github.com/robots.txt
|
||||
```
|
||||
|
||||
:::note
|
||||
|
||||
If you installed Anubis from [an OS package](/docs/admin/native-install), you may need to run `anubis-robots2policy` instead of `robots2policy`.
|
||||
|
||||
:::
|
||||
|
||||
We hope that this will help you get started with Anubis faster. We are working on a version of this that will run in the documentation via WebAssembly.
|
||||
|
||||
### Open Graph configuration is being moved to the policy file
|
||||
|
||||
Anubis supports reading [Open Graph tags](/docs/admin/configuration/open-graph) from target services and returning them in challenge pages. This makes the right metadata show up when linking services protected by Anubis in chat applications or on social media.
|
||||
|
||||
In order to test the migration of all of the configuration to the policy file, Open Graph configuration has been moved to the policy file. For more information, please read [the Open Graph configuration options](/docs/admin/configuration/open-graph#configuration-options).
|
||||
|
||||
You can also set default Open Graph tags:
|
||||
|
||||
```yaml
|
||||
openGraph:
|
||||
enabled: true
|
||||
ttl: 24h
|
||||
# If set, return these opengraph values instead of looking them up with
|
||||
# the target service.
|
||||
#
|
||||
# Correlates to properties in https://ogp.me/
|
||||
override:
|
||||
# og:title is required, it is the title of the website
|
||||
"og:title": "Techaro Anubis"
|
||||
"og:description": >-
|
||||
Anubis is a Web AI Firewall Utility that helps you fight the bots
|
||||
away so that you can maintain uptime at work!
|
||||
"description": >-
|
||||
Anubis is a Web AI Firewall Utility that helps you fight the bots
|
||||
away so that you can maintain uptime at work!
|
||||
```
|
||||
|
||||
## Improvements and optimizations
|
||||
|
||||
One of the biggest improvements we've made in v1.20.0 is replacing [SHA-256 with xxhash](https://github.com/TecharoHQ/anubis/pull/676). Anubis uses hashes all over the place to help with identifying clients, matching against rules when allowing traffic through, in error messages sent to users, and more. Historically these have been done with [SHA-256](https://en.wikipedia.org/wiki/SHA-2), however this has been having a mild performance impact in real-world use. As a result, we now use [xxhash](https://xxhash.com/) when possible. This makes policy matching 3x faster in some scenarios and reduces memory usage across the board.
|
||||
|
||||
Anubis now uses [bart](https://pkg.go.dev/github.com/gaissmai/bart) for doing IP address matching when you specify addresses in a `remote_address` check configuration or when you are matching against [advanced checks](/docs/admin/thoth). This uses the same kind of IP address routing configuration that your OS kernel does, making it very fast to query information about IP addresses. This makes IP address range matches anywhere from 3-14 times faster depending on the number of addresses it needs to match against. For more information and benchmarks, check out [@JasonLovesDoggo](https://github.com/JasonLovesDoggo)'s PR: [perf: replace cidranger with bart for significant performance improvements #675](https://github.com/TecharoHQ/anubis/pull/675).
|
||||
|
||||
## What's up next?
|
||||
|
||||
v1.21.0 is already shaping up to be a massive improvement as Anubis adds [internationalization](https://en.wikipedia.org/wiki/Internationalization) support, allowing your users to see its messages in the language they're most comfortable with.
|
||||
|
||||
So far Anubis supports the following languages:
|
||||
|
||||
- English (Simplified and Traditional)
|
||||
- French
|
||||
- Portugese (Brazil)
|
||||
- Spanish
|
||||
|
||||
If you want to contribute translations, please [file an issue](https://github.com/TecharoHQ/anubis/issues/new) with your language of choice or submit a pull request to [the `lib/localization/locales` folder](https://github.com/TecharoHQ/anubis/tree/main/lib/localization/locales). We are about to introduce features to the translation stack, so you may want to hold off a hot minute, but we welcome any and all contributions to making Anubis useful to a global audience.
|
||||
|
||||
Other things we plan to do:
|
||||
|
||||
- Move configuration to the policy file
|
||||
- Support reloading the policy file at runtime without having to restart Anubis
|
||||
- Detecting if a client is "brand new"
|
||||
- A [Valkey](https://valkey.io/)-backed store for sharing information between instances of Anubis
|
||||
- Augmenting No-JS support in the paid product
|
||||
- TLS fingerprinting
|
||||
- Automated testing improvements in CI (FreeBSD CI support, better automated integration/functional testing, etc.)
|
||||
|
||||
## Conclusion
|
||||
|
||||
I hope that these features let you get the same Anubis power you've come to know and love and increases the things you can do with it! I've been really excited to ship [thresholds](/docs/admin/configuration/thresholds) and the cloud-based services for Anubis.
|
||||
|
||||
If you run into any problems, please [file an issue](https://github.com/TecharoHQ/anubis/issues/new). Otherwise, have a good day and get back to making your communities great.
|
||||
BIN
docs/blog/2025-06-27-release-1.20.0/sunburst.webp
Normal file
BIN
docs/blog/2025-06-27-release-1.20.0/sunburst.webp
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 9.2 KiB |
9
docs/blog/authors.yml
Normal file
9
docs/blog/authors.yml
Normal file
@@ -0,0 +1,9 @@
|
||||
xe:
|
||||
name: Xe Iaso
|
||||
title: CEO @ Techaro
|
||||
url: https://github.com/Xe
|
||||
image_url: https://github.com/Xe.png
|
||||
email: xe@techaro.lol
|
||||
page: true
|
||||
socials:
|
||||
github: Xe
|
||||
@@ -11,6 +11,210 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
<!-- This changes the project to: -->
|
||||
|
||||
- Add `COOKIE_SECURE` option to set the cookie [Secure flag](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Cookies#block_access_to_your_cookies)
|
||||
- Sets cookie defaults to use [SameSite: None](https://web.dev/articles/samesite-cookies-explained)
|
||||
- Determine the `BIND_NETWORK`/`--bind-network` value from the bind address ([#677](https://github.com/TecharoHQ/anubis/issues/677)).
|
||||
- Implement localization system. Find locale files in lib/localization/locales/.
|
||||
- Implement a [development container](https://containers.dev/) manifest to make contributions easier.
|
||||
- Fix dynamic cookie domains functionality ([#731](https://github.com/TecharoHQ/anubis/pull/731))
|
||||
- Add option for custom cookie prefix ([#732](https://github.com/TecharoHQ/anubis/pull/732))
|
||||
- Remove the "Success" interstitial after a proof of work challenge is concluded.
|
||||
- Add option for forcing a specific language ([#742](https://github.com/TecharoHQ/anubis/pull/742))
|
||||
|
||||
### Potentially breaking changes
|
||||
|
||||
The following potentially breaking change applies to native installs with systemd only:
|
||||
|
||||
Each instance of systemd service template now has a unique `RuntimeDirectory`, as opposed to each instance of the service sharing a `RuntimeDirectory`. This change was made to avoid [the `RuntimeDirectory` getting nuked any time one of the Anubis instances restarts](https://github.com/TecharoHQ/anubis/issues/748).
|
||||
|
||||
If you configured Anubis' unix sockets to listen on `/run/anubis/foo.sock` for instance `anubis@foo`, you will need to configure Anubis to listen on `/run/anubis/foo/sock` and additionally configure your HTTP load balancer as appropriate.
|
||||
|
||||
If you need the legacy behaviour, install this [systemd unit dropin](https://www.flatcar.org/docs/latest/setup/systemd/drop-in-units/):
|
||||
|
||||
```systemd
|
||||
# /etc/systemd/system/anubis@.service.d/50-runtimedir.conf
|
||||
[Service]
|
||||
RuntimeDirectory=anubis
|
||||
```
|
||||
|
||||
## v1.20.0: Thancred Waters
|
||||
|
||||
The big ticket items are as follows:
|
||||
|
||||
- Implement a no-JS challenge method: [`metarefresh`](./admin/configuration/challenges/metarefresh.mdx) ([#95](https://github.com/TecharoHQ/anubis/issues/95))
|
||||
- Implement request "weight", allowing administrators to customize the behaviour of Anubis based on specific criteria
|
||||
- Implement GeoIP and ASN based checks via [Thoth](https://anubis.techaro.lol/docs/admin/thoth) ([#206](https://github.com/TecharoHQ/anubis/issues/206))
|
||||
- Add [custom weight thresholds](./admin/configuration/thresholds.mdx) via CEL ([#688](https://github.com/TecharoHQ/anubis/pull/688))
|
||||
- Move Open Graph configuration [to the policy file](./admin/configuration/open-graph.mdx)
|
||||
- Enable support for Open Graph metadata to be returned by default instead of doing lookups against the target
|
||||
- Add `robots2policy` CLI utility to convert robots.txt files to Anubis challenge policies using CEL expressions ([#409](https://github.com/TecharoHQ/anubis/issues/409))
|
||||
- Refactor challenge presentation logic to use a challenge registry
|
||||
- Allow challenge implementations to register HTTP routes
|
||||
- [Imprint/Impressum support](./admin/configuration/impressum.mdx) ([#362](https://github.com/TecharoHQ/anubis/issues/362))
|
||||
- Fix "invalid response" after "Success!" in Chromium ([#564](https://github.com/TecharoHQ/anubis/issues/564))
|
||||
|
||||
A lot of performance improvements have been made:
|
||||
|
||||
- Replace internal SHA256 hashing with xxhash for 4-6x performance improvement in policy evaluation and cache operations
|
||||
- Optimized the OGTags subsystem with reduced allocations and runtime per request by up to 66%
|
||||
- Replace cidranger with bart for IP range checking, improving IP matching performance by 3-20x with zero heap
|
||||
allocations
|
||||
|
||||
And some cleanups/refactors were added:
|
||||
|
||||
- Fix OpenGraph passthrough ([#717](https://github.com/TecharoHQ/anubis/issues/717))
|
||||
- Remove the unused `/test-error` endpoint and update the testing endpoint `/make-challenge` to only be enabled in
|
||||
development
|
||||
- Add `--xff-strip-private` flag/envvar to toggle skipping X-Forwarded-For private addresses or not
|
||||
- Bump AI-robots.txt to version 1.37
|
||||
- Make progress bar styling more compatible (UXP, etc)
|
||||
- Add `--strip-base-prefix` flag/envvar to strip the base prefix from request paths when forwarding to target servers
|
||||
- Fix an off-by-one in the default threshold config
|
||||
- Add functionality for HS512 JWT algorithm
|
||||
- Add support for dynamic cookie domains with the `--cookie-dynamic-domain`/`COOKIE_DYNAMIC_DOMAIN` flag/envvar
|
||||
|
||||
Request weight is one of the biggest ticket features in Anubis. This enables Anubis to be much closer to a Web Application Firewall and when combined with custom thresholds allows administrators to have Anubis take advanced reactions. For more information about request weight, see [the request weight section](./admin/policies.mdx#request-weight) of the policy file documentation.
|
||||
|
||||
TL;DR when you have one or more WEIGHT rules like this:
|
||||
|
||||
```yaml
|
||||
bots:
|
||||
- name: gitea-session-token
|
||||
action: WEIGH
|
||||
expression:
|
||||
all:
|
||||
- '"Cookie" in headers'
|
||||
- headers["Cookie"].contains("i_love_gitea=")
|
||||
# Remove 5 weight points
|
||||
weight:
|
||||
adjust: -5
|
||||
```
|
||||
|
||||
You can configure custom thresholds like this:
|
||||
|
||||
```yaml
|
||||
thresholds:
|
||||
- name: minimal-suspicion # This client is likely fine, its soul is lighter than a feather
|
||||
expression: weight < 0 # a feather weighs zero units
|
||||
action: ALLOW # Allow the traffic through
|
||||
|
||||
# For clients that had some weight reduced through custom rules, give them a
|
||||
# lightweight challenge.
|
||||
- name: mild-suspicion
|
||||
expression:
|
||||
all:
|
||||
- weight >= 0
|
||||
- weight < 10
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
# https://anubis.techaro.lol/docs/admin/configuration/challenges/metarefresh
|
||||
algorithm: metarefresh
|
||||
difficulty: 1
|
||||
report_as: 1
|
||||
|
||||
# For clients that are browser-like but have either gained points from custom
|
||||
# rules or report as a standard browser.
|
||||
- name: moderate-suspicion
|
||||
expression:
|
||||
all:
|
||||
- weight >= 10
|
||||
- weight < 20
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
# https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
|
||||
algorithm: fast
|
||||
difficulty: 2 # two leading zeros, very fast for most clients
|
||||
report_as: 2
|
||||
|
||||
# For clients that are browser like and have gained many points from custom
|
||||
# rules
|
||||
- name: extreme-suspicion
|
||||
expression: weight >= 20
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
# https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
|
||||
algorithm: fast
|
||||
difficulty: 4
|
||||
report_as: 4
|
||||
```
|
||||
|
||||
These thresholds apply when no other `ALLOW`, `DENY`, or `CHALLENGE` rule matches the request. `WEIGHT` rules add and remove request weight as needed:
|
||||
|
||||
```yaml
|
||||
bots:
|
||||
- name: gitea-session-token
|
||||
action: WEIGH
|
||||
expression:
|
||||
all:
|
||||
- '"Cookie" in headers'
|
||||
- headers["Cookie"].contains("i_love_gitea=")
|
||||
# Remove 5 weight points
|
||||
weight:
|
||||
adjust: -5
|
||||
|
||||
- name: bot-like-user-agent
|
||||
action: WEIGH
|
||||
expression: '"Bot" in userAgent'
|
||||
# Add 5 weight points
|
||||
weight:
|
||||
adjust: 5
|
||||
```
|
||||
|
||||
Of note: the default "generic browser" rule assigns 10 weight points:
|
||||
|
||||
```yaml
|
||||
# Generic catchall rule
|
||||
- name: generic-browser
|
||||
user_agent_regex: >-
|
||||
Mozilla|Opera
|
||||
action: WEIGH
|
||||
weight:
|
||||
adjust: 10
|
||||
```
|
||||
|
||||
Adjust this as you see fit.
|
||||
|
||||
## v1.19.1: Jenomis cen Lexentale - Echo 1
|
||||
|
||||
- Return `data/bots/ai-robots-txt.yaml` to avoid breaking configs [#599](https://github.com/TecharoHQ/anubis/issues/599)
|
||||
|
||||
## v1.19.0: Jenomis cen Lexentale
|
||||
|
||||
Mostly a bunch of small features, no big ticket things this time.
|
||||
|
||||
- Record if challenges were issued via the API or via embedded JSON in the challenge page HTML ([#531](https://github.com/TecharoHQ/anubis/issues/531))
|
||||
- Ensure that clients that are shown a challenge support storing cookies
|
||||
- Imprint the version number into challenge pages
|
||||
- Encode challenge pages with gzip level 1
|
||||
- Add PowerPC 64 bit little-endian builds (`GOARCH=ppc64le`)
|
||||
- Add `check-spelling` for spell checking
|
||||
- Add `--target-insecure-skip-verify` flag/envvar to allow Anubis to hit a self-signed HTTPS backend
|
||||
- Minor adjustments to FreeBSD rc.d script to allow for more flexible configuration.
|
||||
- Added Podman and Docker support for running Playwright tests
|
||||
- Add a default rule to throw challenges when a request with the `X-Firefox-Ai` header is set
|
||||
- Updated the nonce value in the challenge JWT cookie to be a string instead of a number
|
||||
- Rename cookies in response to user feedback
|
||||
- Ensure cookie renaming is consistent across configuration options
|
||||
- Add Bookstack app in data
|
||||
- Truncate everything but the first five characters of Accept-Language headers when making challenges
|
||||
- Ensure client JavaScript is served with Content-Type text/javascript.
|
||||
- Add `--target-host` flag/envvar to allow changing the value of the Host header in requests forwarded to the target service
|
||||
- Bump AI-robots.txt to version 1.31
|
||||
- Add `RuntimeDirectory` to systemd unit settings so native packages can listen over unix sockets
|
||||
- Added SearXNG instance tracker whitelist policy
|
||||
- Added Qualys SSL Labs whitelist policy
|
||||
- Fixed cookie deletion logic ([#520](https://github.com/TecharoHQ/anubis/issues/520), [#522](https://github.com/TecharoHQ/anubis/pull/522))
|
||||
- Add `--target-sni` flag/envvar to allow changing the value of the TLS handshake hostname in requests forwarded to the target service
|
||||
- Fixed CEL expression matching validator to now properly error out when it receives empty expressions
|
||||
- Added OpenRC init.d script
|
||||
- Added `--version` flag
|
||||
- Added `anubis_proxied_requests_total` metric to count proxied requests
|
||||
- Add `Applebot` as "good" web crawler
|
||||
- Reorganize AI/LLM crawler blocking into three separate stances, maintaining existing status quo as default
|
||||
- Split out AI/LLM user agent blocking policies, adding documentation for each
|
||||
|
||||
## v1.18.0: Varis zos Galvus
|
||||
|
||||
The big ticket feature in this release is [CEL expression matching support](https://anubis.techaro.lol/docs/admin/configuration/expressions). This allows you to tailor your approach for the individual services you are protecting.
|
||||
@@ -35,7 +239,7 @@ Or as complicated as:
|
||||
expression:
|
||||
all:
|
||||
- >-
|
||||
(
|
||||
(
|
||||
userAgent.startsWith("git/") ||
|
||||
userAgent.contains("libgit") ||
|
||||
userAgent.startsWith("go-git") ||
|
||||
@@ -65,7 +269,7 @@ Other changes:
|
||||
- Use CSS variables to deduplicate styles
|
||||
- Fixed native packages not containing the stdlib and botPolicies.yaml
|
||||
- Change import syntax to allow multi-level imports
|
||||
- Changed the startup logging to use JSON formatting as all the other logs do.
|
||||
- Changed the startup logging to use JSON formatting as all the other logs do
|
||||
- Added the ability to do [expression matching with CEL](./admin/configuration/expressions.mdx)
|
||||
- Add a warning for clients that don't store cookies
|
||||
- Disable Open Graph passthrough by default ([#435](https://github.com/TecharoHQ/anubis/issues/435))
|
||||
@@ -106,7 +310,6 @@ Other changes:
|
||||
- Moved all CSS inline to the Xess package, changed colors to be CSS variables
|
||||
- Set or append to `X-Forwarded-For` header unless the remote connects over a loopback address [#328](https://github.com/TecharoHQ/anubis/issues/328)
|
||||
- Fixed mojeekbot user agent regex
|
||||
- Added support for running anubis behind a base path (e.g. `/myapp`)
|
||||
- Reduce Anubis' paranoia with user cookies ([#365](https://github.com/TecharoHQ/anubis/pull/365))
|
||||
- Added support for Open Graph passthrough while using unix sockets
|
||||
- The Open Graph subsystem now passes the HTTP `HOST` header through to the origin
|
||||
|
||||
8
docs/docs/admin/configuration/challenges/_category_.json
Normal file
8
docs/docs/admin/configuration/challenges/_category_.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"label": "Challenges",
|
||||
"position": 10,
|
||||
"link": {
|
||||
"type": "generated-index",
|
||||
"description": "The different challenge methods that Anubis supports."
|
||||
}
|
||||
}
|
||||
19
docs/docs/admin/configuration/challenges/metarefresh.mdx
Normal file
19
docs/docs/admin/configuration/challenges/metarefresh.mdx
Normal file
@@ -0,0 +1,19 @@
|
||||
# Meta Refresh (No JavaScript)
|
||||
|
||||
The `metarefresh` challenge sends a browser a much simpler challenge that makes it refresh the page after a set period of time. This enables clients to pass challenges without executing JavaScript.
|
||||
|
||||
To use it in your Anubis configuration:
|
||||
|
||||
```yaml
|
||||
# Generic catchall rule
|
||||
- name: generic-browser
|
||||
user_agent_regex: >-
|
||||
Mozilla|Opera
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
difficulty: 1 # Number of seconds to wait before refreshing the page
|
||||
report_as: 4 # Unused by this challenge method
|
||||
algorithm: metarefresh # Specify a non-JS challenge method
|
||||
```
|
||||
|
||||
This is not enabled by default while this method is tested and its false positive rate is ascertained. Many modern scrapers use headless Google Chrome, so this will have a much higher false positive rate.
|
||||
@@ -0,0 +1,5 @@
|
||||
# Proof of Work (JavaScript)
|
||||
|
||||
When Anubis is configured to use the `fast` or `slow` challenge methods, clients will be sent a small [proof of work](https://en.wikipedia.org/wiki/Proof_of_work) challenge. In order to get a token used to access the upstream resource, clients must calculate a complicated math puzzle with JavaScript.
|
||||
|
||||
A `fast` challenge uses a heavily optimized multithreaded implementation and a `slow` challenge uses a simplistic single-threaded implementation. The `slow` method is kept around for legacy compatibility.
|
||||
@@ -143,7 +143,29 @@ Anubis would return a challenge because all of those conditions are true.
|
||||
|
||||
## Functions exposed to Anubis expressions
|
||||
|
||||
There are currently no functions from the Anubis runtime exposed to expressions. This will change in the future.
|
||||
Anubis expressions can be augmented with the following functions:
|
||||
|
||||
### `randInt`
|
||||
|
||||
```ts
|
||||
function randInt(n: int): int;
|
||||
```
|
||||
|
||||
randInt returns a randomly selected integer value in the range of `[0,n)`. This is a thin wrapper around [Go's math/rand#Intn](https://pkg.go.dev/math/rand#Intn). Be careful with this as it may cause inconsistent behavior for genuine users.
|
||||
|
||||
This is best applied when doing explicit block rules, eg:
|
||||
|
||||
```yaml
|
||||
# Denies LightPanda about 75% of the time on average
|
||||
- name: deny-lightpanda-sometimes
|
||||
action: DENY
|
||||
expression:
|
||||
all:
|
||||
- userAgent.matches("LightPanda")
|
||||
- randInt(16) >= 4
|
||||
```
|
||||
|
||||
It seems counter-intuitive to allow known bad clients through sometimes, but this allows you to confuse attackers by making Anubis' behavior random. Adjust the thresholds and numbers as facts and circumstances demand.
|
||||
|
||||
## Life advice
|
||||
|
||||
|
||||
@@ -14,7 +14,7 @@ EG:
|
||||
{
|
||||
"bots": [
|
||||
{
|
||||
"import": "(data)/bots/ai-robots-txt.yaml"
|
||||
"import": "(data)/bots/ai-catchall.yaml"
|
||||
},
|
||||
{
|
||||
"import": "(data)/bots/cloudflare-workers.yaml"
|
||||
@@ -29,8 +29,8 @@ EG:
|
||||
```yaml
|
||||
bots:
|
||||
# Pathological bots to deny
|
||||
- # This correlates to data/bots/ai-robots-txt.yaml in the source tree
|
||||
import: (data)/bots/ai-robots-txt.yaml
|
||||
- # This correlates to data/bots/ai-catchall.yaml in the source tree
|
||||
import: (data)/bots/ai-catchall.yaml
|
||||
- import: (data)/bots/cloudflare-workers.yaml
|
||||
```
|
||||
|
||||
@@ -46,7 +46,7 @@ Of note, a bot rule can either have inline bot configuration or import a bot con
|
||||
{
|
||||
"bots": [
|
||||
{
|
||||
"import": "(data)/bots/ai-robots-txt.yaml",
|
||||
"import": "(data)/bots/ai-catchall.yaml",
|
||||
"name": "generic-browser",
|
||||
"user_agent_regex": "Mozilla|Opera\n",
|
||||
"action": "CHALLENGE"
|
||||
@@ -60,7 +60,7 @@ Of note, a bot rule can either have inline bot configuration or import a bot con
|
||||
|
||||
```yaml
|
||||
bots:
|
||||
- import: (data)/bots/ai-robots-txt.yaml
|
||||
- import: (data)/bots/ai-catchall.yaml
|
||||
name: generic-browser
|
||||
user_agent_regex: >
|
||||
Mozilla|Opera
|
||||
@@ -167,7 +167,7 @@ static
|
||||
├── botPolicies.json
|
||||
├── botPolicies.yaml
|
||||
├── bots
|
||||
│ ├── ai-robots-txt.yaml
|
||||
│ ├── ai-catchall.yaml
|
||||
│ ├── cloudflare-workers.yaml
|
||||
│ ├── headless-browsers.yaml
|
||||
│ └── us-ai-scraper.yaml
|
||||
|
||||
70
docs/docs/admin/configuration/impressum.mdx
Normal file
70
docs/docs/admin/configuration/impressum.mdx
Normal file
@@ -0,0 +1,70 @@
|
||||
# Imprint / Impressum configuration
|
||||
|
||||
Some jurisdictions (such as the European Union and specifically Germany) [must have contact information freely available](https://www.privacycompany.eu/blog/the-imprint-requirement-a-must-have-for-companies-from-outside-germany) on an imprint/impressum page. Anubis supports creating an Anubis-specific imprint page for your organization with the `impressum` block in your bot policy file. For example:
|
||||
|
||||
```yaml
|
||||
impressum:
|
||||
# Displayed at the bottom of every page rendered by Anubis.
|
||||
footer: >-
|
||||
This website is hosted by Techaro. If you have any complaints or notes
|
||||
about the service, please contact
|
||||
<a href="mailto:contact@techaro.lol">contact@techaro.lol</a> and we
|
||||
will assist you as soon as possible.
|
||||
|
||||
# The imprint page that will be linked to at the footer of every Anubis page.
|
||||
page:
|
||||
# The HTML <title> of the page
|
||||
title: Imprint and Privacy Policy
|
||||
# The HTML contents of the page. The exact contents of this page can
|
||||
# and will vary by locale. Please consult with a lawyer if you are not
|
||||
# sure what to put here
|
||||
body: >-
|
||||
<p>Last updated: June 2025</p>
|
||||
|
||||
<h2>Information that is gathered from visitors</h2>
|
||||
|
||||
<p>In common with other websites, log files are stored on the web server saving details such as the visitor's IP address, browser type, referring page and time of visit.</p>
|
||||
|
||||
<p>Cookies may be used to remember visitor preferences when interacting with the website.</p>
|
||||
|
||||
<p>Where registration is required, the visitor's email and a username will be stored on the server.</p>
|
||||
|
||||
<!-- ... -->
|
||||
```
|
||||
|
||||
If you are subscribed to and using [advanced classification features](../thoth.mdx), be sure to disclose the following:
|
||||
|
||||
```html
|
||||
<h2>Techaro Anubis</h2>
|
||||
|
||||
<p>
|
||||
This website uses a service called
|
||||
<a href="https://anubis.techaro.lol">Anubis</a> by
|
||||
<a href="https://techaro.lol">Techaro</a> to filter malicious traffic. Anubis
|
||||
requires the use of browser cookies to ensure that web clients are running
|
||||
conformant software. Anubis also may report the following data to Techaro to
|
||||
improve service quality:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
IP address (for purposes of matching against geo-location and BGP autonomous
|
||||
systems numbers), which is stored in-memory and not persisted to disk.
|
||||
</li>
|
||||
<li>
|
||||
Unique browser fingerprints (such as HTTP request fingerprints and
|
||||
encryption system fingerprints), which may be stored on Techaro's side for a
|
||||
period of up to one month.
|
||||
</li>
|
||||
<li>
|
||||
HTTP request metadata that may include things such as the User-Agent header
|
||||
and other identifiers.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
This data is processed and stored for the legitimate interest of combatting
|
||||
abusive web clients. This data is encrypted at rest as much as possible and is
|
||||
only decrypted in memory for the purposes of fulfilling requests.
|
||||
</p>
|
||||
```
|
||||
@@ -9,12 +9,45 @@ This page provides detailed information on how to configure [Open Graph tag](htt
|
||||
|
||||
## Configuration Options
|
||||
|
||||
Open Graph settings are configured in the `openGraph` section of the [Policy File](../policies.mdx).
|
||||
|
||||
```yaml
|
||||
openGraph:
|
||||
# Enables Open Graph passthrough
|
||||
enabled: true
|
||||
# Enables the use of the HTTP host in the cache key, this enables
|
||||
# caching metadata for multiple http hosts at once.
|
||||
considerHost: true
|
||||
# How long cached OpenGraph metadata should last in memory
|
||||
ttl: 24h
|
||||
# If set, return these opengraph values instead of looking them up with
|
||||
# the target service.
|
||||
#
|
||||
# Correlates to properties in https://ogp.me/
|
||||
override:
|
||||
# og:title is required, it is the title of the website
|
||||
"og:title": "Techaro Anubis"
|
||||
"og:description": >-
|
||||
Anubis is a Web AI Firewall Utility that helps you fight the bots
|
||||
away so that you can maintain uptime at work!
|
||||
"description": >-
|
||||
Anubis is a Web AI Firewall Utility that helps you fight the bots
|
||||
away so that you can maintain uptime at work!
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary>Configuration flags / envvars (old)</summary>
|
||||
|
||||
Open Graph passthrough used to be configured with configuration flags / environment variables. Reference to these settings are maintained for backwards compatibility's sake.
|
||||
|
||||
| Name | Description | Type | Default | Example |
|
||||
| ------------------------ | --------------------------------------------------------- | -------- | ------- | ----------------------------- |
|
||||
| `OG_PASSTHROUGH` | Enables or disables the Open Graph tag passthrough system | Boolean | `true` | `OG_PASSTHROUGH=true` |
|
||||
| `OG_EXPIRY_TIME` | Configurable cache expiration time for Open Graph tags | Duration | `24h` | `OG_EXPIRY_TIME=1h` |
|
||||
| `OG_CACHE_CONSIDER_HOST` | Enables or disables the use of the host in the cache key | Boolean | `false` | `OG_CACHE_CONSIDER_HOST=true` |
|
||||
|
||||
</details>
|
||||
|
||||
## Usage
|
||||
|
||||
To configure Open Graph tags, you can set the following environment variables, environment file or as flags in your Anubis configuration:
|
||||
|
||||
@@ -10,6 +10,20 @@ Anubis can act in one of two modes:
|
||||
1. Reverse proxy (the default): Anubis sits in the middle of all traffic and then will reverse proxy it to its destination. This is the moral equivalent of a middleware in your favorite web framework.
|
||||
2. Subrequest authentication mode: Anubis listens for requests and if they don't pass muster then they are forwarded to Anubis for challenge processing. This is the equivalent of Anubis being a sidecar service.
|
||||
|
||||
:::note
|
||||
|
||||
Subrequest authentication requires changing the default policy because nginx interprets the default `DENY` status code `200` as successful authentication and allows the request.
|
||||
|
||||
```yaml
|
||||
status_codes:
|
||||
CHALLENGE: 200
|
||||
DENY: 403
|
||||
```
|
||||
|
||||
[See policy definitions](../policies.mdx).
|
||||
|
||||
:::
|
||||
|
||||
## Nginx
|
||||
|
||||
Anubis can perform [subrequest authentication](https://docs.nginx.com/nginx/admin-guide/security-controls/configuring-subrequest-authentication/) with the `auth_request` module in Nginx. In order to set this up, keep the following things in mind:
|
||||
|
||||
140
docs/docs/admin/configuration/thresholds.mdx
Normal file
140
docs/docs/admin/configuration/thresholds.mdx
Normal file
@@ -0,0 +1,140 @@
|
||||
# Weight Threshold Configuration
|
||||
|
||||
Anubis offers the ability to assign "weight" to requests. This is a custom level of suspicion that rules can add to or remove from. For example, here's how you assign 10 weight points to anything that might be a browser:
|
||||
|
||||
```yaml
|
||||
# botPolicies.yaml
|
||||
|
||||
bots:
|
||||
- name: generic-browser
|
||||
user_agent_regex: >-
|
||||
Mozilla|Opera
|
||||
action: WEIGH
|
||||
weight:
|
||||
adjust: 10
|
||||
```
|
||||
|
||||
Thresholds let you take this per-request weight value and take actions in response to it. Thresholds are defined alongside your bot configuration in `botPolicies.yaml`.
|
||||
|
||||
:::note
|
||||
|
||||
Thresholds DO NOT apply when a request matches a bot rule with the CHALLENGE action. Thresholds only apply when requests don't match any terminal bot rules.
|
||||
|
||||
:::
|
||||
|
||||
```yaml
|
||||
# botPolicies.yaml
|
||||
|
||||
bots: ...
|
||||
|
||||
thresholds:
|
||||
- name: minimal-suspicion
|
||||
expression: weight < 0
|
||||
action: ALLOW
|
||||
|
||||
- name: mild-suspicion
|
||||
expression:
|
||||
all:
|
||||
- weight >= 0
|
||||
- weight < 10
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
algorithm: metarefresh
|
||||
difficulty: 1
|
||||
report_as: 1
|
||||
|
||||
- name: moderate-suspicion
|
||||
expression:
|
||||
all:
|
||||
- weight >= 10
|
||||
- weight < 20
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
algorithm: fast
|
||||
difficulty: 2
|
||||
report_as: 2
|
||||
|
||||
- name: extreme-suspicion
|
||||
expression: weight >= 20
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
algorithm: fast
|
||||
difficulty: 4
|
||||
report_as: 4
|
||||
```
|
||||
|
||||
This defines a suite of 4 thresholds:
|
||||
|
||||
1. If the request weight is less than zero, allow it through.
|
||||
2. If the request weight is greater than or equal to zero, but less than ten: give it [a very lightweight challenge](./challenges/metarefresh.mdx).
|
||||
3. If the request weight is greater than or equal to ten, but less than twenty: give it [a slightly heavier challenge](./challenges/proof-of-work.mdx).
|
||||
4. Otherwise, give it [the heaviest challenge](./challenges/proof-of-work.mdx).
|
||||
|
||||
Thresholds can be configured with the following options:
|
||||
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Name</th>
|
||||
<th>Description</th>
|
||||
<th>Example</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>`name`</td>
|
||||
<td>The human-readable name for this threshold.</td>
|
||||
<td>
|
||||
|
||||
```yaml
|
||||
name: extreme-suspicion
|
||||
```
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>`expression`</td>
|
||||
<td>A [CEL](https://cel.dev/) expression taking the request weight and returning true or false</td>
|
||||
<td>
|
||||
|
||||
To check if the request weight is less than zero:
|
||||
|
||||
```yaml
|
||||
expression: weight < 0
|
||||
```
|
||||
|
||||
To check if it's between 0 and 10 (inclusive):
|
||||
|
||||
```yaml
|
||||
expression:
|
||||
all:
|
||||
- weight >= 0
|
||||
- weight < 10
|
||||
```
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>`action`</td>
|
||||
<td>The Anubis action to apply: `ALLOW`, `CHALLENGE`, or `DENY`</td>
|
||||
<td>
|
||||
|
||||
```yaml
|
||||
action: ALLOW
|
||||
```
|
||||
|
||||
If you set the CHALLENGE action, you must set challenge details:
|
||||
|
||||
```yaml
|
||||
action: CHALLENGE
|
||||
challenge:
|
||||
algorithm: metarefresh
|
||||
difficulty: 1
|
||||
report_as: 1
|
||||
```
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
</tbody>
|
||||
</table>
|
||||
@@ -34,27 +34,6 @@ These examples assume that you are using a setup where your nginx configuration
|
||||
|
||||
:::
|
||||
|
||||
## Dependencies
|
||||
|
||||
Install the following dependencies for proxying HTTP:
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="rpm" label="Red Hat / RPM" default>
|
||||
|
||||
```text
|
||||
dnf -y install mod_proxy_html
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="deb" label="Debian / Ubuntu / apt">
|
||||
|
||||
```text
|
||||
apt-get install -y libapache2-mod-proxy-html libxml2-dev
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
## Configuration
|
||||
|
||||
Assuming you are protecting `anubistest.techaro.lol`, you need the following server configuration blocks:
|
||||
@@ -92,6 +71,7 @@ Assuming you are protecting `anubistest.techaro.lol`, you need the following ser
|
||||
# throw an "admin misconfiguration" error.
|
||||
RequestHeader set "X-Real-Ip" expr=%{REMOTE_ADDR}
|
||||
RequestHeader set X-Forwarded-Proto "https"
|
||||
RequestHeader set "X-Http-Version" "%{SERVER_PROTOCOL}s"
|
||||
|
||||
ProxyPreserveHost On
|
||||
|
||||
@@ -119,6 +99,14 @@ Make sure to add a separate configuration file for the listener on port 3001:
|
||||
```text
|
||||
# /etc/httpd/conf.d/listener-3001.conf
|
||||
|
||||
Listen [::1]:3001
|
||||
```
|
||||
|
||||
In case you are running an IPv4-only system, use the following configuration instead:
|
||||
|
||||
```text
|
||||
# /etc/httpd/conf.d/listener-3001.conf
|
||||
|
||||
Listen 127.0.0.1:3001
|
||||
```
|
||||
|
||||
|
||||
@@ -61,6 +61,7 @@ server {
|
||||
location / {
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Http-Version $server_protocol;
|
||||
proxy_pass http://anubis;
|
||||
}
|
||||
|
||||
|
||||
@@ -3,11 +3,10 @@ id: traefik
|
||||
title: Traefik
|
||||
---
|
||||
|
||||
|
||||
:::note
|
||||
|
||||
This only talks about integration through Compose,
|
||||
but it also applies to docker cli options.
|
||||
This only talks about integration through Compose,
|
||||
but it also applies to docker cli options.
|
||||
|
||||
:::
|
||||
|
||||
|
||||
8
docs/docs/admin/frameworks/_category_.json
Normal file
8
docs/docs/admin/frameworks/_category_.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"label": "Frameworks",
|
||||
"position": 30,
|
||||
"link": {
|
||||
"type": "generated-index",
|
||||
"description": "Information about getting specific frameworks or tools working with Anubis."
|
||||
}
|
||||
}
|
||||
45
docs/docs/admin/frameworks/htmx.mdx
Normal file
45
docs/docs/admin/frameworks/htmx.mdx
Normal file
@@ -0,0 +1,45 @@
|
||||
# HTMX
|
||||
|
||||
import Tabs from "@theme/Tabs";
|
||||
import TabItem from "@theme/TabItem";
|
||||
|
||||
[HTMX](https://htmx.org) is a framework that enables you to write applications using hypertext as the engine of application state. This enables you to simplify you server side code by having it return HTML instead of JSON. This can interfere with Anubis because Anubis challenge pages also return HTML.
|
||||
|
||||
To work around this, you can make a custom [expression](../configuration/expressions.mdx) rule that allows HTMX requests if the user has passed a challenge in the past:
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="json" label="JSON">
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "allow-htmx-iff-already-passed-challenge",
|
||||
"action": "ALLOW",
|
||||
"expression": {
|
||||
"all": [
|
||||
"\"Cookie\" in headers",
|
||||
"headers[\"Cookie\"].contains(\"anubis-auth\")",
|
||||
"\"Hx-Request\" in headers",
|
||||
"headers[\"Hx-Request\"] == \"true\""
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="yaml" label="YAML" default>
|
||||
|
||||
```yaml
|
||||
- name: allow-htmx-iff-already-passed-challenge
|
||||
action: ALLOW
|
||||
expression:
|
||||
all:
|
||||
- '"Cookie" in headers'
|
||||
- 'headers["Cookie"].contains("anubis-auth")'
|
||||
- '"Hx-Request" in headers'
|
||||
- 'headers["Hx-Request"] == "true"'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
This will reduce some security because it does not assert the validity of the Anubis auth cookie, however in trade it improves the experience for existing users.
|
||||
39
docs/docs/admin/frameworks/wordpress.mdx
Normal file
39
docs/docs/admin/frameworks/wordpress.mdx
Normal file
@@ -0,0 +1,39 @@
|
||||
# Wordpress
|
||||
|
||||
Wordpress is the most popular blog engine on the planet.
|
||||
|
||||
## Using a multi-site setup with Anubis
|
||||
|
||||
If you have a multi-site setup where traffic goes through Anubis like this:
|
||||
|
||||
```mermaid
|
||||
---
|
||||
title: Apache as tls terminator and HTTP router
|
||||
---
|
||||
|
||||
flowchart LR
|
||||
T(User Traffic)
|
||||
subgraph Apache 2
|
||||
TCP(TCP 80/443)
|
||||
US(TCP 3001)
|
||||
end
|
||||
|
||||
An(Anubis)
|
||||
B(Backend)
|
||||
|
||||
T --> |TLS termination| TCP
|
||||
TCP --> |Traffic filtering| An
|
||||
An --> |Happy traffic| US
|
||||
US --> |whatever you're doing| B
|
||||
```
|
||||
|
||||
Wordpress may not realize that the underlying connection is being done over HTTPS. This could lead to a redirect loop in the `/wp-admin/` routes. In order to fix this, add the following to your `wp-config.php` file:
|
||||
|
||||
```php
|
||||
if (isset($_SERVER['HTTP_X_FORWARDED_PROTO']) && $_SERVER['HTTP_X_FORWARDED_PROTO'] === 'https') {
|
||||
$_SERVER['HTTPS'] = 'on';
|
||||
$_SERVER['SERVER_PORT'] = 443;
|
||||
}
|
||||
```
|
||||
|
||||
This will make Wordpress think that your connection is over HTTPS instead of plain HTTP.
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user