wordpresssearch-enginerobots.txt

Using robots.txt to block /?param=X


I have created a website using wordpress, and the first day it was full of dummy content until I uploaded mine. Google indexed pages such as:

www.url.com/?cat=1

Now these pages doesn't exists, and to make a removal request google ask me to block them on robots.txt

Should I use:

User-Agent: *
Disallow: /?cat=

or

User-Agent: *
Disallow: /?cat=*

My robots.txt file would look something like this:

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /author
Disallow: /?cat=
Sitemap: http://url.com/sitemap.xml.gz

Does this look fine or can it cause any problem with search engines? Should I use Allow: / along with all the Disallow:?


Solution

  • I would go with this actually

    To block access to all URLs that include a question mark (?) (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string):

    User-agent: Googlebot
    Disallow: /*?
    

    So I would actually go with:

    User-agent: Googlebot
    Disallow: /*?cat=
    

    Resourse (Under patttern matching)