mirror of
https://github.com/TecharoHQ/anubis.git
synced 2026-04-14 04:28:49 +00:00
* feat: add robots2policy CLI utility to convert robots.txt to Anubis challenge policies
* feat: add documentation for robots2policy CLI tool
* feat: implement crawl delay handling as weight adjustment in Anubis rules
* feat: add various robots.txt and YAML configurations for user agent handling and crawl delays
* test: add comprehensive tests for robots2policy conversion and parsing
* fix: update example URL in usage instructions for robots2policy CLI
* Update metadata
check-spelling run (pull_request) for json/robots2policycli
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
* docs: add crawl delay weight adjustment and deny user agents option to robots2policy CLI
* Update cmd/robots2policy/main.go
Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>
* Update cmd/robots2policy/main.go
Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>
* fix(robots2policy): use sigs.k8s.io/yaml
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(config): properly marshal bot policy rules
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(yeetfile): expose robots2policy in libexec
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(yeetfile): put robots2policy in $PATH
Signed-off-by: Xe Iaso <me@xeiaso.net>
* Update metadata
check-spelling run (pull_request) for json/robots2policycli
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
* style: reorder imports
* refactor: use preexisting structs in config
* fix: correct flag check in main function
* fix: reorder fields in AnubisRule struct for better alignment
* style: improve alignment of struct fields in AnubisRule and OGTagCache
* Update metadata
check-spelling run (pull_request) for json/robots2policycli
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
* fix: add validation for generated Anubis rules from robots.txt
* feat: add batch processing for robots.txt files to generate Anubis CEL policies
* fix: improve usage message and error handling for input file requirement
* refactor: update AnubisRule structure to use ExpressionOrList for improved expression handling
* refactor: reorganize policy definitions in YAML files for consistency and clarity
* fix: correct indentation in blacklist and complex YAML files for consistency
* test: enhance output comparison in robots2policy tests for YAML and JSON formats
* Revert "fix: improve usage message and error handling for input file requirement"
This reverts commit ddcde1f2a3.
* fix: improve usage message and error handling in robots2policy
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
---------
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>
Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Co-authored-by: Xe Iaso <me@xeiaso.net>
296 lines
2.3 KiB
Plaintext
296 lines
2.3 KiB
Plaintext
acs
|
|
aeacus
|
|
Aibrew
|
|
alrest
|
|
amazonbot
|
|
anthro
|
|
anubis
|
|
anubistest
|
|
apk
|
|
Applebot
|
|
archlinux
|
|
badregexes
|
|
bdba
|
|
berr
|
|
betteralign
|
|
bingbot
|
|
bitcoin
|
|
blogging
|
|
Bluesky
|
|
blueskybot
|
|
boi
|
|
botnet
|
|
BPort
|
|
Brightbot
|
|
broked
|
|
Bytespider
|
|
cachebuster
|
|
Caddyfile
|
|
caninetools
|
|
Cardyb
|
|
celchecker
|
|
CELPHASE
|
|
cerr
|
|
certresolver
|
|
CGNAT
|
|
cgr
|
|
chainguard
|
|
chall
|
|
challengemozilla
|
|
checkpath
|
|
checkresult
|
|
chen
|
|
chibi
|
|
cidranger
|
|
ckie
|
|
cloudflare
|
|
confd
|
|
containerbuild
|
|
coreutils
|
|
Cotoyogi
|
|
CRDs
|
|
crt
|
|
Cscript
|
|
daemonizing
|
|
DDOS
|
|
Debian
|
|
debrpm
|
|
decaymap
|
|
decompiling
|
|
Diffbot
|
|
discordapp
|
|
discordbot
|
|
distros
|
|
dnf
|
|
dnsbl
|
|
dnserr
|
|
dracula
|
|
dronebl
|
|
droneblresponse
|
|
duckduckbot
|
|
eerror
|
|
ellenjoe
|
|
enbyware
|
|
everyones
|
|
evilbot
|
|
evilsite
|
|
expressionorlist
|
|
externalagent
|
|
externalfetcher
|
|
extldflags
|
|
facebookgo
|
|
Factset
|
|
fastcgi
|
|
fediverse
|
|
finfos
|
|
Firecrawl
|
|
flagenv
|
|
Fordola
|
|
forgejo
|
|
fsys
|
|
fullchain
|
|
Galvus
|
|
gha
|
|
gitea
|
|
goland
|
|
gomod
|
|
goodbot
|
|
googlebot
|
|
govulncheck
|
|
goyaml
|
|
GPG
|
|
GPT
|
|
gptbot
|
|
grw
|
|
Hashcash
|
|
hashrate
|
|
headermap
|
|
healthcheck
|
|
hebis
|
|
hec
|
|
hmc
|
|
hostable
|
|
htmlc
|
|
htmx
|
|
httpdebug
|
|
hypertext
|
|
iaskspider
|
|
iat
|
|
ifm
|
|
Imagesift
|
|
imgproxy
|
|
inp
|
|
iss
|
|
isset
|
|
ivh
|
|
Jenomis
|
|
JGit
|
|
journalctl
|
|
jshelter
|
|
JWTs
|
|
kagi
|
|
kagibot
|
|
keikaku
|
|
keypair
|
|
KHTML
|
|
kinda
|
|
KUBECONFIG
|
|
lcj
|
|
ldflags
|
|
letsencrypt
|
|
Lexentale
|
|
lgbt
|
|
licend
|
|
licstart
|
|
lightpanda
|
|
LIMSA
|
|
Linting
|
|
linuxbrew
|
|
LLU
|
|
loadbalancer
|
|
lol
|
|
LOMINSA
|
|
maintainership
|
|
malware
|
|
mcr
|
|
memes
|
|
metarefresh
|
|
metrix
|
|
mimi
|
|
minica
|
|
mistralai
|
|
Mojeek
|
|
mojeekbot
|
|
mozilla
|
|
nbf
|
|
netsurf
|
|
NFlag
|
|
nginx
|
|
nobots
|
|
NONINFRINGEMENT
|
|
nosleep
|
|
OCOB
|
|
ogtags
|
|
omgili
|
|
omgilibot
|
|
onionservice
|
|
openai
|
|
openrc
|
|
pag
|
|
palemoon
|
|
Pangu
|
|
parseable
|
|
passthrough
|
|
Patreon
|
|
pgrep
|
|
phrik
|
|
pidfile
|
|
pids
|
|
pipefail
|
|
pki
|
|
podkova
|
|
podman
|
|
prebaked
|
|
privkey
|
|
promauto
|
|
promhttp
|
|
proofofwork
|
|
pwcmd
|
|
pwuser
|
|
qualys
|
|
qwant
|
|
qwantbot
|
|
rac
|
|
rcvar
|
|
redir
|
|
redirectscheme
|
|
relayd
|
|
reputational
|
|
reqmeta
|
|
risc
|
|
ruleset
|
|
runlevels
|
|
RUnlock
|
|
sas
|
|
sasl
|
|
Scumm
|
|
searchbot
|
|
searx
|
|
sebest
|
|
secretplans
|
|
selfsigned
|
|
Semrush
|
|
Seo
|
|
setsebool
|
|
shellcheck
|
|
Sidetrade
|
|
sitemap
|
|
sls
|
|
sni
|
|
Sourceware
|
|
Spambot
|
|
sparkline
|
|
spyderbot
|
|
srv
|
|
stackoverflow
|
|
startprecmd
|
|
stoppostcmd
|
|
subgrid
|
|
subr
|
|
subrequest
|
|
SVCNAME
|
|
tagline
|
|
tarballs
|
|
techaro
|
|
techarohq
|
|
templ
|
|
templruntime
|
|
testarea
|
|
Tik
|
|
Timpibot
|
|
torproject
|
|
traefik
|
|
uberspace
|
|
unixhttpd
|
|
unmarshal
|
|
unparseable
|
|
uuidgen
|
|
uvx
|
|
UXP
|
|
Varis
|
|
Velen
|
|
vendored
|
|
vhosts
|
|
videotest
|
|
waitloop
|
|
weblate
|
|
webmaster
|
|
webpage
|
|
websecure
|
|
websites
|
|
Webzio
|
|
wildbase
|
|
wordpress
|
|
Workaround
|
|
workdir
|
|
wpbot
|
|
xcaddy
|
|
Xeact
|
|
xeiaso
|
|
xeserv
|
|
xesite
|
|
xess
|
|
xff
|
|
XForwarded
|
|
XNG
|
|
XReal
|
|
yae
|
|
YAMLTo
|
|
yeet
|
|
yeetfile
|
|
yourdomain
|
|
yoursite
|
|
Zenos
|
|
zizmor
|
|
zos
|