rxmlxml-sitemap

How to create a sitemap.xml file using R and the {XML} package?


I have a vector of links from which I would like to create a sitemap.xml file (file protocol is available from here: http://www.sitemaps.org/protocol.html)

I understand the sitemap.xml protocol (it is rather simple), but I'm not sure what is the smartest way to use the {XML} package for it.

A simple example:

 links <- c("http://r-statistics.com",
             "http://www.r-statistics.com/on/r/",
             "http://www.r-statistics.com/on/ubuntu/")

How can "links" be used to construct a sitemap.xml file?


Solution

  • Is something like this what you are looking for. (It uses the httr package to get the last modified bit and writes the XML directly with the very useful whisker package.)

    require(whisker)
    require(httr)
    tpl <- '
    <?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
     {{#links}}
       <url>
          <loc>{{{loc}}}</loc>
          <lastmod>{{{lastmod}}}</lastmod>
          <changefreq>{{{changefreq}}}</changefreq>
          <priority>{{{priority}}}</priority>
       </url>
     {{/links}}
    </urlset>
    '
    
    links <- c("http://r-statistics.com", "http://www.r-statistics.com/on/r/", "http://www.r-statistics.com/on/ubuntu/")
    
    
    map_links <- function(l) {
      tmp <- GET(l)
      d <- tmp$headers[['last-modified']]
    
      list(loc=l,
           lastmod=format(as.Date(d,format="%a, %d %b %Y %H:%M:%S")),
           changefreq="monthly",
           priority="0.8")
    }
    
    links <- lapply(links, map_links)
    
    cat(whisker.render(tpl))