google-searchgoogle-search-consolegoogle-search-api

How to tell google not to index header and footer of a website


I am using the google search api to provide a site search function for my website. However, I got a problem on the search results since Google will index whole content on every pages, including the header and footer. When I entered some words appeared in the header as the keyword, the api will return almost all pages as the results since every page contains the header, as the keyword.

May I ask if there is any method that can avoid google indexing the header/footer content in my website or exclude the page results which find the keywords in header/footer? Thank you.


Solution

  • You can use a robots.txt file for media files

    or the meta tag in the header like

    Example for indexing or non-indexing:

    <meta name="robots" content="index,follow,all">
    
    <meta name="robots" content="noindex, nofollow">.
    

    for the whole page.

    But you can't disallow google not to index some part's of your header or footer.

    The commands/rules means pls don't index, but google will still always crawl the whole site.

    What do you do with the other search engines, most bots(crawler scripts) don't follow these rules!

    If other pages link to your page with descriptive text, Google can also index the URL without a page view.

    You can check the google cache of your site.

    How do I see the Google Cache of a website?

    1. In Google's search box, type the website or page you're trying to see.
    2. Beside the URL, click the down arrow.
    3. Select "Cached".
    4. You are now viewing the cached page.

    You can use different title, description and keywords for every site.

    If you doing seo it's better to have that unique.

    From 2021:

    Google's John Mueller said again that Google does not index parts of a page - like the header versus the footer versus the main area of the content. Google indexes it all and does not index a piece or part of it. But there is one exception, embedded content on the page, like images, JavaScript embeds and so on.