mirror of
https://github.com/TecharoHQ/anubis.git
synced 2026-04-11 11:08:48 +00:00
feat: first implementation of honeypot logic
This is a bit of an experiment, stick with me. The core idea here is that badly written crawlers are that: badly written. They look for anything that contains `<a href="whatever" />` tags and will blindly use those values to recurse. This takes advantage of that by hiding a link in a `<script>` tag like this: ```html <script type="ignore"><a href="/bots-only">Don't click</a></script> ``` Browsers will ignore it because they have no handler for the "ignore" script type. This current draft is very unoptimized (it takes like 7 seconds to generate a page on my tower), however switching spintax libraries will make this much faster. The hope is to make this pluggable with WebAssembly such that we force administrators to choose a storage method. First we crawl before we walk. The AI involvement in this commit is limited to the spintax in affirmations.txt, spintext.txt, and titles.txt. This generates a bunch of "pseudoprofound bullshit" like the following: > This Restoration to Balance & Alignment > > There's a moment when creators are being called to realize that the work > can't be reduced to results, but about energy. We don't innovate products > by pushing harder, we do it by holding the vision. Because momentum can't > be forced, it unfolds over time when culture are moving in the same > direction. We're being invited into a paradigm shift in how we think > about innovation. [...] This is intended to "look" like normal article text. As this is a first draft, this sucks and will be improved upon. Assisted-by: GLM 4.6, ChatGPT, GPT-OSS 120b Signed-off-by: Xe Iaso <me@xeiaso.net>
This commit is contained in:
@@ -95,49 +95,49 @@ bots:
|
||||
# weight:
|
||||
# adjust: -10
|
||||
|
||||
# Assert behaviour that only genuine browsers display. This ensures that Chrome
|
||||
# or Firefox versions
|
||||
- name: realistic-browser-catchall
|
||||
expression:
|
||||
all:
|
||||
- '"User-Agent" in headers'
|
||||
- '( userAgent.contains("Firefox") ) || ( userAgent.contains("Chrome") ) || ( userAgent.contains("Safari") )'
|
||||
- '"Accept" in headers'
|
||||
- '"Sec-Fetch-Dest" in headers'
|
||||
- '"Sec-Fetch-Mode" in headers'
|
||||
- '"Sec-Fetch-Site" in headers'
|
||||
- '"Accept-Encoding" in headers'
|
||||
- '( headers["Accept-Encoding"].contains("zstd") || headers["Accept-Encoding"].contains("br") )'
|
||||
- '"Accept-Language" in headers'
|
||||
action: WEIGH
|
||||
weight:
|
||||
adjust: -10
|
||||
# # Assert behaviour that only genuine browsers display. This ensures that Chrome
|
||||
# # or Firefox versions
|
||||
# - name: realistic-browser-catchall
|
||||
# expression:
|
||||
# all:
|
||||
# - '"User-Agent" in headers'
|
||||
# - '( userAgent.contains("Firefox") ) || ( userAgent.contains("Chrome") ) || ( userAgent.contains("Safari") )'
|
||||
# - '"Accept" in headers'
|
||||
# - '"Sec-Fetch-Dest" in headers'
|
||||
# - '"Sec-Fetch-Mode" in headers'
|
||||
# - '"Sec-Fetch-Site" in headers'
|
||||
# - '"Accept-Encoding" in headers'
|
||||
# - '( headers["Accept-Encoding"].contains("zstd") || headers["Accept-Encoding"].contains("br") )'
|
||||
# - '"Accept-Language" in headers'
|
||||
# action: WEIGH
|
||||
# weight:
|
||||
# adjust: -10
|
||||
|
||||
# The Upgrade-Insecure-Requests header is typically sent by browsers, but not always
|
||||
- name: upgrade-insecure-requests
|
||||
expression: '"Upgrade-Insecure-Requests" in headers'
|
||||
action: WEIGH
|
||||
weight:
|
||||
adjust: -2
|
||||
# # The Upgrade-Insecure-Requests header is typically sent by browsers, but not always
|
||||
# - name: upgrade-insecure-requests
|
||||
# expression: '"Upgrade-Insecure-Requests" in headers'
|
||||
# action: WEIGH
|
||||
# weight:
|
||||
# adjust: -2
|
||||
|
||||
# Chrome should behave like Chrome
|
||||
- name: chrome-is-proper
|
||||
expression:
|
||||
all:
|
||||
- userAgent.contains("Chrome")
|
||||
- '"Sec-Ch-Ua" in headers'
|
||||
- 'headers["Sec-Ch-Ua"].contains("Chromium")'
|
||||
- '"Sec-Ch-Ua-Mobile" in headers'
|
||||
- '"Sec-Ch-Ua-Platform" in headers'
|
||||
action: WEIGH
|
||||
weight:
|
||||
adjust: -5
|
||||
# # Chrome should behave like Chrome
|
||||
# - name: chrome-is-proper
|
||||
# expression:
|
||||
# all:
|
||||
# - userAgent.contains("Chrome")
|
||||
# - '"Sec-Ch-Ua" in headers'
|
||||
# - 'headers["Sec-Ch-Ua"].contains("Chromium")'
|
||||
# - '"Sec-Ch-Ua-Mobile" in headers'
|
||||
# - '"Sec-Ch-Ua-Platform" in headers'
|
||||
# action: WEIGH
|
||||
# weight:
|
||||
# adjust: -5
|
||||
|
||||
- name: should-have-accept
|
||||
expression: '!("Accept" in headers)'
|
||||
action: WEIGH
|
||||
weight:
|
||||
adjust: 5
|
||||
# - name: should-have-accept
|
||||
# expression: '!("Accept" in headers)'
|
||||
# action: WEIGH
|
||||
# weight:
|
||||
# adjust: 5
|
||||
|
||||
# Generic catchall rule
|
||||
- name: generic-browser
|
||||
|
||||
Reference in New Issue
Block a user