Block archive.org and archive.is crawling my site


#1

I tried to block Archive.org and Archive.is from accessing my website using 3 methods:

robots.txt
User-agent: ia_archiver
Disallow: /

User-agent: archive.org_bot
Disallow: /

User-agent: ia_archiver-web.archive.org
Disallow: /

.htaccess
SetEnvIfNoCase User-Agent “^archive.org_bot” bad_bot
SetEnvIfNoCase User-Agent “^ia_archiver” bad_bot
SetEnvIfNoCase User-Agent “^ia_archiver-web.archive.org” bad_bot

Order Allow,Deny Allow from all Deny from env=bad_bot
Cloudflare firewall / User Agent Blocking
archive.org_bot

When I archive a post it ignores “deny access” rule.

Anyone?


#2

Google is your friend…took me 3.5 seconds…


#3

Jep I know this article. I’ve tried it with “/” and without “/”.
I think you should try it yourself and you will find out the bots ignore the Robots.txt.


#4

ps: I’m not able to find out the IP address of the bots.


#5

I haven’t tried it as we don’t have any reason to block it.