web-crawlerrobots.txtduckduckgo

Block a site from search engine - DuckDuckGo


I have a development site https://text-domain.example. When I go to https://duckduckgo.com and search for text-domain.example, it does return results.

What have I tried so far:

Created robots.txt file with following code(put in in my root directory i.e in text-domain.example/robots.txt):

User-agent: *
Disallow: /

Then added meta-tag like this in my template file:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

Even after doing this, I searched on DuckDuckGo and it yielded the same result. Any suggestions would be welcome.

PS.

after waiting for few days there are 2 findings:

Is it possible to completely block from showing in the results?


Solution

  • DuckDuckGo should honour your robots.txt. Their bot DuckDuckBot is documented at https://duckduckgo.com/duckduckbot.

    But note: the DuckDuckGo bot isn’t crawling everything itself (as DuckDuckGo gets results from other sources), so your pages might still show up if you don’t block the bots of these other sources (like Bing). Refer to mlissner’s answer for more details.

    With robots.txt, there are two things to consider:


    Using the robots-meta element with noindex would prevent even listing the URLs in search engines like Google, but DDG doesn’t seem to support it.

    Note that you used wrong quotation marks in your example. It should be

    <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    

    instead of

    <META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>