jekyllnoindex

How to tell Jekyll to hide one page from search engines?


I have a website consisting of my public profile, made in Jekyll.

It also contains one page, say 'details.html', which contains more personal information about me. I want only those people to see this page whom I give out the link to. In particular, I'd like to hide it from search engines.

How do I best do this? I've heard I can add a robots.txt file or include a meta tag 'nofollow' or 'noindex'.

  1. Which is the usual solution here?
  2. If the way to go is to add a meta tag, how do I add it in only one page given a standard Jekyll setup?

Solution

  • The robots.txt is the standard way of telling search engines what to index and what not to (not just for Jekyll, but for websites in general).

    Just create a file called robots.txt in the root of your Jekyll site, with the paths that should not be indexed.

    e.g.

    User-agent: *
    Disallow: /2017/02/11/post-that-should-not-be-indexed/
    Disallow: /page-that-should-not-be-indexed/
    Allow: /
    

    Jekyll will automagically copy the robots.txt to the folder where the site gets generated.


    You can also test your robots.txt to make sure it is working the way you expect: https://support.google.com/webmasters/answer/6062598?hl=en


    Update 2021-08-02 - Google Specific settings:

    You can prevent a page from appearing in Google Search by including a noindex meta tag in the page's HTML code, or by returning a noindex header in the HTTP response

    There are two ways to implement noindex: as a meta tag and as an HTTP response header. They have the same effect; choose the method that is more convenient for your site.

    <meta> tag

    To prevent most search engine web crawlers from indexing a page on your site, place the following meta tag into the <head> section of your page:

    <meta name="robots" content="noindex">
    

    To prevent only Google web crawlers from indexing a page:

    <meta name="googlebot" content="noindex">
    

    HTTP response header

    Instead of a meta tag, you can also return an X-Robots-Tag header with a value of either noindex or none in your response. Here's an example of an HTTP response with an X-Robots-Tag instructing crawlers not to index a page:

    HTTP/1.1 200 OK
    (...)
    X-Robots-Tag: noindex
    (...)
    

    More details: https://developers.google.com/search/docs/advanced/crawling/block-indexing