githubrobots.txt

How to stop Google indexing my Github repository


I use Github to store the text of one of my web sites, but the problem is Google indexing the text in Github as well. So the same text will show up both on my site and on Github. e.g. this search The top hit is my site. The second hit is the Github repository.

I don't mind if people see the sources but I don't want Google to index it (and maybe penalize for duplicate content.) Is there any way, besides taking the repository private, to tell Google to stop indexing it?

What happens in the case of Github Pages? Those are sites where the source is in a Github repository. Do they have the same problem of duplication?

Take this search the top most hit leads to the Marpa site but I don't see the source listed in the search result. How?


Solution

  • Update

    This answer is not correct any more since GitHub changed the default branch from "master" to "main" and also changed the "robots.txt" file.

    Original

    The https://github.com/robots.txt file of GitHub allows the indexing of the blobs in the 'master' branch, but restricts all other branches. So if you don't have a 'master' branch, Google is not supposed to index your pages.

    How to remove the 'master' branch:

    In your clone create a new branch - let's call it 'main' and push it to GitHub

    git checkout -b main
    git push -u origin main
    

    On GitHub change the default branch (see in the Settings section of your repository) or here https://github.com/blog/421-pick-your-default-branch

    Then remove the master branch from your clone and from GitHub:

    git branch -d master
    git push origin :master
    

    Get other people who might have already forked your repository to do the same.

    Alternatively, if you'd like to financially support GitHub, you can go private https://help.github.com/articles/making-a-public-repository-private