rcurlowncloud

List and access files on owncloud public folder with R


I need to provide data to my students for using R in class. I uploaded the data on a public folder in owncloud. The link to the folder is public, without any password.

I can't figure out how to list all the links to each file, so that they can read it directly all of them.

So far I used:

r <- RCurl::getURL("https://server",verbose=FALSE, dirlistonly = TRUE)
XML::getHTMLLinks(r)

but the result is:

[1] "http://enable-javascript.com/"                                         
[2] "/owncloud/index.php"                                                   
[3] "https://server"
[4] ""                                                                      
[5] ""                                                                      
[6] "https://owncloud.org"                       

i.e. only the link at the top of the page, not the links to each file in the folder.

Any help is appreciated, thanks,

A


Solution

  • Ok, after more digging around (e.g. here), I found the solution. The trick is to use ownCloud's WebDAV service and specify API for public shares.

    # Install required packages if you do not have them
    # install.packages("xml2")
    # install.packages("httr")
    
    
    # specify your ownCloud provider
    provider <- "https://owncloud.example.com" 
    
    # specify your webDav endpoint (for publicly shared folders, do not include username)
    # will most likely be "remote.php/dav" or "remote.php/webdav" 
    webdav <- "remote.php/dav" 
    
    # specify API for public links
    api <- "public-files"
    
    # specify sharing token portion of the URL
    token <- "ToFnJDJKz27EQU" # just an example token
    
    # construct URL
    url <- paste(provider, webdav, api, token, sep = "/") 
    
    # specify depth at > 1 if you want to track files in subfolders
    depth <- 1
    
    # run request
    r <- httr::VERB(
        verb = "PROPFIND",
        url = url,
        httr::add_headers(depth = depth),
        httr::authenticate(token, "")
    )
    
    # parse result
    x <- httr::content(r)
    xml_links <- xml2::xml_find_all(x, ".//d:href")
    partial_links <- xml2::xml_text(xml_links)
    
    # get direct download links via webdav
    links <- paste0(provider, partial_links)