rdata.tablecranvignette

available CRAN vignettes


There's the available.packages() function to list all packages available on CRAN. Is there a similar function to find all available vignettes? If not how would I get a list of all vignettes and the packages they're associated with?

As a corner case to keep in mind the data.table package has 3 vignettes associated with it.

EDIT: Per Andrie's response I realize I wasn't clear. I know about the vignette function for finding all the available local vignettes, I'm after a way to get all the vignettes of all packages on CRAN.


Solution

  • I seem to recall looking at this in response to some SO question (can't find it now) and deciding that since the information isn't included in the output of available.packages(), nor in the result of applying readRDS to @CRAN/web/packages/packages.rds (a trick from Jeroen Ooms), I couldn't think of a non-scraping way to do it ...

    Here's my scraper, applied to the first 100 packages (leading to 44 vignettes)

    pkgs <- unname(available.packages()[, 1])[1:100]
    vindex_urls <- paste0(getOption("repos"),"/web/packages/", pkgs, 
        "/vignettes/index.rds", sep = "")
    getf <- function(x) {
          ## I think there should be a way to do this directly
          ## with readRDS(url(...)) but I can't get it to work
        suppressWarnings(
                  download.file(x,"tmp.rds",quiet=TRUE))
        readRDS("tmp.rds")
    }
    library(plyr)
    vv <- ldply(vindex_urls,
                .progress="text",
                function(x) {
                    if (inherits(z <- try(getf(x),silent=TRUE),
                        "try-error")) NULL else z
                })
    tmpf <- function(x,n) { if (is.null(x)) NULL else
                                data.frame(pkg=n,x) }
    vframe <- do.call(rbind,mapply(tmpf,vv,pkgs))
    rownames(vframe) <- NULL
    head(vframe[,c("pkg","Title")])
    

    There may be ways to clean this up/make it more compact, but it seems to work OK. Your scrape once/update occasionally strategy seems reasonable. Or if you wanted you could scrape daily (or weekly or whatever seems reasonable) and save/post the results somewhere publicly accessible, then include a function with that URL hard-coded in the package ... or even create a nicely formatted HTML table, with links, that the whole world could use (and then add Viagra ads to the page, and $$PROFIT$$ ...)

    edit: wrapped both the download and the readRDS in a function, so I can wrap the whole thing in try