mirror of
https://github.com/TecharoHQ/anubis.git
synced 2026-04-08 17:48:44 +00:00
* feat: add robots2policy CLI utility to convert robots.txt to Anubis challenge policies
* feat: add documentation for robots2policy CLI tool
* feat: implement crawl delay handling as weight adjustment in Anubis rules
* feat: add various robots.txt and YAML configurations for user agent handling and crawl delays
* test: add comprehensive tests for robots2policy conversion and parsing
* fix: update example URL in usage instructions for robots2policy CLI
* Update metadata
check-spelling run (pull_request) for json/robots2policycli
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
* docs: add crawl delay weight adjustment and deny user agents option to robots2policy CLI
* Update cmd/robots2policy/main.go
Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>
* Update cmd/robots2policy/main.go
Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>
* fix(robots2policy): use sigs.k8s.io/yaml
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(config): properly marshal bot policy rules
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(yeetfile): expose robots2policy in libexec
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(yeetfile): put robots2policy in $PATH
Signed-off-by: Xe Iaso <me@xeiaso.net>
* Update metadata
check-spelling run (pull_request) for json/robots2policycli
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
* style: reorder imports
* refactor: use preexisting structs in config
* fix: correct flag check in main function
* fix: reorder fields in AnubisRule struct for better alignment
* style: improve alignment of struct fields in AnubisRule and OGTagCache
* Update metadata
check-spelling run (pull_request) for json/robots2policycli
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
* fix: add validation for generated Anubis rules from robots.txt
* feat: add batch processing for robots.txt files to generate Anubis CEL policies
* fix: improve usage message and error handling for input file requirement
* refactor: update AnubisRule structure to use ExpressionOrList for improved expression handling
* refactor: reorganize policy definitions in YAML files for consistency and clarity
* fix: correct indentation in blacklist and complex YAML files for consistency
* test: enhance output comparison in robots2policy tests for YAML and JSON formats
* Revert "fix: improve usage message and error handling for input file requirement"
This reverts commit ddcde1f2a3.
* fix: improve usage message and error handling in robots2policy
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
---------
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>
Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Co-authored-by: Xe Iaso <me@xeiaso.net>
85 lines
2.4 KiB
Plaintext
85 lines
2.4 KiB
Plaintext
---
|
|
title: robots2policy CLI Tool
|
|
sidebar_position: 50
|
|
---
|
|
|
|
The `robots2policy` tool converts robots.txt files into Anubis challenge policies. It reads robots.txt rules and generates equivalent CEL expressions for path matching and user-agent filtering.
|
|
|
|
## Installation
|
|
|
|
Install directly with Go:
|
|
|
|
```bash
|
|
go install github.com/TecharoHQ/anubis/cmd/robots2policy@latest
|
|
```
|
|
## Usage
|
|
|
|
Basic conversion from URL:
|
|
|
|
```bash
|
|
robots2policy -input https://www.example.com/robots.txt
|
|
```
|
|
|
|
Convert local file to YAML:
|
|
|
|
```bash
|
|
robots2policy -input robots.txt -output policy.yaml
|
|
```
|
|
|
|
Convert with custom settings:
|
|
|
|
```bash
|
|
robots2policy -input robots.txt -action DENY -format json
|
|
```
|
|
|
|
## Options
|
|
|
|
| Flag | Description | Default |
|
|
|-----------------------|--------------------------------------------------------------------|---------------------|
|
|
| `-input` | robots.txt file path or URL (use `-` for stdin) | *required* |
|
|
| `-output` | Output file (use `-` for stdout) | stdout |
|
|
| `-format` | Output format: `yaml` or `json` | `yaml` |
|
|
| `-action` | Action for disallowed paths: `ALLOW`, `DENY`, `CHALLENGE`, `WEIGH` | `CHALLENGE` |
|
|
| `-name` | Policy name prefix | `robots-txt-policy` |
|
|
| `-crawl-delay-weight` | Weight adjustment for crawl-delay rules | `3` |
|
|
| `-deny-user-agents` | Action for blacklisted user agents | `DENY` |
|
|
|
|
## Example
|
|
|
|
Input robots.txt:
|
|
```txt
|
|
User-agent: *
|
|
Disallow: /admin/
|
|
Disallow: /private
|
|
|
|
User-agent: BadBot
|
|
Disallow: /
|
|
```
|
|
|
|
Generated policy:
|
|
```yaml
|
|
- name: robots-txt-policy-disallow-1
|
|
action: CHALLENGE
|
|
expression:
|
|
single: path.startsWith("/admin/")
|
|
- name: robots-txt-policy-disallow-2
|
|
action: CHALLENGE
|
|
expression:
|
|
single: path.startsWith("/private")
|
|
- name: robots-txt-policy-blacklist-3
|
|
action: DENY
|
|
expression:
|
|
single: userAgent.contains("BadBot")
|
|
```
|
|
|
|
## Using the Generated Policy
|
|
|
|
Save the output and import it in your main policy file:
|
|
|
|
```yaml
|
|
import:
|
|
- path: "./robots-policy.yaml"
|
|
```
|
|
|
|
The tool handles wildcard patterns, user-agent specific rules, and blacklisted bots automatically.
|