Lets say we got these pages:
1. http://www.mywebsite.com/users/thomas-roberts
2. http://www.mywebsite.com/pages/thomas-roberts/1
3. http://www.mywebsite.com/pages/thomas-roberts/hello-kitty-collection
Is there a possibility to do this in a sitemap.xml:
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://mywebsite.com/users/^(\w+)$/</loc>
<lastmod>2006-11-18</lastmod>
<changefreq>daily</changefreq>
<priority>1</priority>
</url>
<url>
<loc>http://mywebsite.com/users/^(\w+)$/pages/^(\w+)$</loc>
<lastmod>2006-11-18</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://mywebsite.com/users/^(\w+)$/pages/^(\d+)$</loc>
<lastmod>2006-11-18</lastmod>
<changefreq>daily</changefreq>
<priority>0.6</priority>
</url>
</urlset>
I hope my example is clear, we don't really specify a new "url" element in the sitemap.xml file, but instead we match a regex to the url, and we just come back everytime to update.
If this might not be a solution, how do Twitter and Facebook index all their pages (profile pages etc.) in Google? Do they generate a new sitemap everytime a new user is created, and update their sitemap evertime someone updates their page / profile?
I was very currious, if indead we got to somehow generate the sitemap.xml (which has a limit of 50.000 items and 10mb) what would be a good idea to generate sitemaps if content gets modified?
Thanks alot.
The sitemap must contain actually URLs. Regex are not acceptable and quite useless as they don't tell the search engines anything.
Sitemaps just tell search engines where to find your content. So if a page's content is modified the sitemap really won't affect it as far as search engines are concerned.