I'm trying to filter some bots by blocking them in ".htaccess" file like this:
#UniversalRules
SetEnvIfNoCase User-Agent ^$ bad_bot #leave this for blank user-agents
SetEnvIfNoCase User-Agent .*\@.* bad_bot
SetEnvIfNoCase User-Agent .*bot.* bad_bot
But these rules also block good bots, so I added below
#Goodbots
SetEnvIfNoCase User-Agent .*google.* good_bot
SetEnvIfNoCase User-Agent .*bingbot.* good_bot #bing
And finally the blocking rule
Order Allow,Deny
Allow from all
Deny from env=bad_bot
But when I'm using GoogleBot useragent (Googlebot/2.1 (+http://www.googlebot.com/bot.html) I'm getting - 403 forbidden.
What's wrong ?
GoogleBot sets both environment variables; setting a variable (good_bot
) does not unset other variables (bad_bot
). You can set one variable and unset it afterwards:
#UniversalRules
SetEnvIfNoCase User-Agent ^$ bad_bot
SetEnvIfNoCase User-Agent .*\@.* bad_bot
SetEnvIfNoCase User-Agent .*bot.* bad_bot
#Goodbots
SetEnvIfNoCase User-Agent .*google.* !bad_bot
SetEnvIfNoCase User-Agent .*bingbot.* !bad_bot
See mod_setenvif reference for examples. BrowserMatchNoCase
provides identical functionality with shorter syntax. And you can remove all .*
in your regex.