I have a DNN site with over 20,000 pages. The Googlebot and Bingbot are consistently crawling my website.
When I look at my sitelog I can see that google and bing are crawling my site via the pageid (ex: www.url.com/Default.aspx?TabID=5000)
The bots are hitting my website every minute. When I add new page, I am expecting the bots to crawl the new added page, instead I see the bots re-crawling very old pages and will take a couple of hours before it recognizes the newly added page.
I have robot.txt file with over 10,000 entries that have the following defenitions:
Disallow:/Default.aspx?TabID=5000
Disallow:/Default.aspx?TabID=5001
Disallow:/Default.aspx?TabID=5002
and so forth.
So I am noticing a couple of issues:
1 - Googlebot and Bingbot are ignoring my disallows and are recrawling pages that I have defined in the robots.txt - how does the bot know to go back and recrawl old pages, using the TabID?
2 - I still notice that when I add a new page, both bots are busy crawling old content, and do not immediately read my new content, is there a way to force Google and Bing bots to always read newly added pages first?
thank you in advance for any suggestions.
If you go to http://URL.com/sitemap.aspx check to see what pages are listed there.
I would highly recommend upgrading to DNN 7 as you can control which pages show up in the sitemap, that may help you control your indexing issues.
UPDATE: Under the Admin Menu, if you find a search engine sitemap page, you can set a minimum page priority to be included in the sitemap. Then for the pages you don't want to show up you can modify their priority in the page settings.