anubis-mirror/data/crawlers/commoncrawl.yaml at 009df6299abb060878ab3894c884042a8ebc9780 - anubis-mirror - Gitea - 2ari.ca

arian/anubis-mirror

mirror of https://github.com/TecharoHQ/anubis.git synced 2026-04-20 07:06:40 +00:00

Files

T

Xe Iaso 7c0996448a chore(default-config): allowlist common crawl (#753 )

This may seem strange, but allowlisting common crawl means that scrapers
have less incentive to scrape because they can just grab the data from
common crawl instead of scraping it again.

2025-07-04 00:10:45 +00:00

13 lines

271 B

YAML

Raw Blame History

 - name: common-crawl
   user_agent_regex: CCBot
   action: ALLOW
   # https://index.commoncrawl.org/ccbot.json
   remote_addresses:
     [
       "2600:1f28:365:80b0::/60",
       "18.97.9.168/29",
       "18.97.14.80/29",
       "18.97.14.88/30",
       "98.85.178.216/32",
     ]