.htaccessx-robots-tag

Using .htaccess to override existing "noindex, nofollow" X-Robots-Tag header?


I'm trying to set X-Robots-Tag to allow Googlebot to index my website. I don't have a robots.txt file and I don't have any meta tags relating to X-Robots-Tag in any of my html files. The Apache server is returning a header with X-Robots-Tag set to "noindex, nofollow". How do I unset this tag by editing the .htaccess file?

This is what I get when using the Chrome addon "Robots Exclusion Checker":

X-Robots status BLOCKED noindex,nofollow.

Date: Thu, 23 Jul 2020 20:27:46 GMT
Content-Type: text/html
Content-Length: 1272
Connection: keep-alive
Keep-Alive: timeout=30
Server: Apache/2
X-Robots-Tag: noindex, nofollow
Last-Modified: Fri, 09 Mar 2018 19:26:43 GMT
ETag: "ae0-xxxxxxxxxx-gzip"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Cache-Control: max-age=3600
Expires: Thu, 23 Jul 2020 21:27:46 GMT

Contents of my .htaccess file:

# compress text, html, javascript, css, xml:
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript

# Or, compress certain file types by extension:
<files *.html>
SetOutputFilter DEFLATE
</files>

Header onsuccess unset X-Robots-Tag
Header always set X-Robots-Tag "index,follow"

I've tried adding this to the bottom of the .htaccess file:

<files *.html>
Header set X-Robots-Tag "index,follow"
</files>

I then get this response from the Chrome extension:

X-Robots BLOCKED noindex,nofollow,index,follow.

(Notice it appears twice in the list below.)

Date: Thu, 23 Jul 2020 20:39:42 GMT
Content-Type: text/html
Content-Length: 1272
Connection: keep-alive
Keep-Alive: timeout=30
Server: Apache/2
X-Robots-Tag: noindex, nofollow
Last-Modified: Fri, 09 Mar 2018 19:26:43 GMT
ETag: "ae0-xxxxxxxxxxxxx-gzip"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Cache-Control: max-age=3600
Expires: Thu, 23 Jul 2020 21:39:42 GMT
X-Robots-Tag: index,follow

Is there a way to delete the original X-Robots-tag header and replace it with the new one? I tried Header unset X-Robots-Tag, but no go (still shows "BLOCKED noindex,nofollow").


Solution: What has worked for me was to include a robots.txt file and to ensure all hyperlinks end with a trailing slash. It seems without the trailing slash I get a 301 redirect, which includes the offending noindex,nofollow header.


Solution

  • My index.html page is very, very simple and only hyperlinks inside the body to other parts of the site.
    The site is hosted on ...

    As noted in comments, you should really identify the source that is setting this header in the first place, rather than trying to override (or unset) it. This is not something Apache does by default, this header must be explicitly set somewhere.

    If you are not setting this header (in your server-side script or any .htaccess file along the filesystem path - even above the document root) then it must be set in the vHost/server config. If you don't have access to the server config then you should contact your webhost to see what's wrong.

    <files *.html>
    Header set X-Robots-Tag "index,follow"
    </files>
    

    This would ordinarily "work", unless the header had previously been set on the always table of response headers. In which case, you would need to do the same. For example:

    Header always set X-Robots-Tag "index,follow"
    

    You shouldn't need the <Files> wrapper - unless you specifically want to target requests that only map to *.html files? I would imagine the "noindex,nofollow" header is being set on every request (eg. images and other static resources).

    However, you don't need to explicitly set "index,follow" - since this is the default behaviour that search engines perform, whether the header is set or not. So, in this case you just need to unset the header (as you also suggest), but again, you'll need to use the always table of headers (if that was the table on which the header was set to begin with). For example:

    Header always unset X-Robots-Tag
    

    The "always" table is perhaps a bit misleadingly named, as the above looks (to the casual reader) that the header is perhaps always unset (as opposed to sometimes) - but that is not the case. There are two separate groups/tables of response headers: "always" and "onsuccess" (the default). The two are mutually exclusive. The difference being that the "always" group are always applied - even on errors and internal rewrites/subrequests. The default group is not.

    Reference:
    https://httpd.apache.org/docs/2.4/mod/mod_headers.html#header