I want to disallow crawling of a directory /acct
in robots.txt
Which rule should I use?
Disallow: /acct
or Disallow: /acct/
acct
contains sub-directories and files both. What is the effect of a trailing slash?
Since robots.txt
rules are all "starts with" rules, both of your proposed rules would disallow the following:
https://example.com/acct/
https://example.com/acct/foo
https://example.com/acct/bar
However, the following would only be disallowed by the rule without the trailing slash:
https://example.com/acct
https://example.com/acct.html
https://example.com/acctbar
Disallow: /acct/
is usually better because there is no risk of disallowing unexpected URLs. However, it does NOT prevent crawling of /acct
.
In most cases web servers redirect directory URLs without a trailing slash to add the trailing slash. It is likely that on your server, https://example.com/acct
redirects to https://example.com/acct/
. If that is the case, it is usually fine to allow bots to crawl /acct
with no trailing slash and see the redirect. They would be blocked from crawling the target of the redirect.