asp.netregexiisurl-rewritingurl-rewrite-module

IIS Rewrite Module exclude bots but allow GoogleBot


I'm using the following IIS Rewrite Rule to block as many bots as possible.

<rule name="BotBlock" stopProcessing="true">
  <match url=".*" />
  <conditions>
    <add input="{HTTP_USER_AGENT}" pattern="^$|bot|crawl|spider" />
  </conditions>
  <action type="CustomResponse" statusCode="403" statusReason="Forbidden" statusDescription="Forbidden" />
</rule>

This rule blocks all requests with an empty User-Agent string or a User-Agent string that contains bot, crawl and spider. This works great but it also blocks googlebot, which I do not want.

So how do I exclude the googlebot string from the above pattern so it does hit the site.

I've tried

^$|!googlebot|bot|crawl|spider

^$|(?!googlebot)|bot|crawl|spider

^(?!googlebot)$|bot|crawl|spider

^$|(!googlebot)|bot|crawl|spider

But they either block all User-Agents or still do not allow googlebot. Who has a solution and knows a bit about regex?

So thanks to The fourth bird the solution becomes:

<add input="{HTTP_USER_AGENT}" pattern="^$|\b(?!.*googlebot.*\b)\w*(?:bot|crawl|spider)\w*" />

Solution

  • If you want to match bot, but not google bot:

    ^$|(?<!\bgoogle)bot|crawl|spider
    

    Regex demo

    Or you could group the alternatives in a non capture group and surround that group with word boundaries to prevent partial matches for all alternatives:

    ^$|\b(?:bot|crawl|spider)\b
    

    Regex demo