I have a development site https://text-domain.example
.
When I go to https://duckduckgo.com and search for text-domain.example
, it does return results.
What have I tried so far:
Created robots.txt
file with following code(put in in my root directory i.e in text-domain.example/robots.txt
):
User-agent: *
Disallow: /
Then added meta-tag like this in my template file:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
Even after doing this, I searched on DuckDuckGo and it yielded the same result. Any suggestions would be welcome.
PS.
after waiting for few days there are 2 findings:
Is it possible to completely block from showing in the results?
DuckDuckGo should honour your robots.txt
. Their bot DuckDuckBot
is documented at https://duckduckgo.com/duckduckbot.
But note: the DuckDuckGo bot isn’t crawling everything itself (as DuckDuckGo gets results from other sources), so your pages might still show up if you don’t block the bots of these other sources (like Bing). Refer to mlissner’s answer for more details.
With robots.txt
, there are two things to consider:
robots.txt
are recognized. You have to wait until the relevant bot visits your site again.robots.txt
, search engines may still list your URLs in their search results (without crawled metadata like title and description).Using the robots
-meta
element with noindex
would prevent even listing the URLs in search engines like Google, but DDG doesn’t seem to support it.
Note that you used wrong quotation marks in your example. It should be
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
instead of
<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>