feat(data): add Citoid to good bots list (#1524)

* Add Wikimedia Foundation citoid services file

Wikimedia Foundation runs a service called citoid which retrieves citation metadata from urls in order to create formatted citations. 

This file contains the ip ranges allocated to the WMF (https://wikitech.wikimedia.org/wiki/IP_and_AS_allocations) from which the services make requests, as well as regex for the User-Agents from both services used to generate citations (citoid, and Zotero's translation-server which citoid makes requests to as well in order to generate the metadata).

Signed-off-by: Marielle Volz <marielle.volz@gmail.com>

* Add Wikimedia Citoid crawler to allowed list

Signed-off-by: Marielle Volz <marielle.volz@gmail.com>

* chore: update spelling

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Marielle Volz <marielle.volz@gmail.com>
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
This commit is contained in:
Marielle Volz
2026-03-20 11:13:26 +00:00
committed by GitHub
parent e0ece7d333
commit 24857f430f
3 changed files with 21 additions and 0 deletions

View File

@@ -31,3 +31,5 @@ Stargate
FFXIV
uvensys
de
envoyproxy
unipromos

View File

@@ -8,4 +8,5 @@
- import: (data)/crawlers/marginalia.yaml
- import: (data)/crawlers/mojeekbot.yaml
- import: (data)/crawlers/commoncrawl.yaml
- import: (data)/crawlers/wikimedia-citoid.yaml
- import: (data)/crawlers/yandexbot.yaml

View File

@@ -0,0 +1,18 @@
# Wikimedia Foundation citation services
# https://www.mediawiki.org/wiki/Citoid
- name: wikimedia-citoid
user_agent_regex: "Citoid/WMF"
action: ALLOW
remote_addresses: [
"208.80.152.0/22",
"2620:0:860::/46",
]
- name: wikimedia-zotero-translation-server
user_agent_regex: "ZoteroTranslationServer/WMF"
action: ALLOW
remote_addresses: [
"208.80.152.0/22",
"2620:0:860::/46",
]