r

Extract hyperlink from Excel file in R


How do I take a cell in Excel, which has text that is hyperlinked, and extract the hyperlink part?


Solution

  • I found a super convoluted way to extract the hyperlinks:

    library(XML)
    
    # rename file to .zip
    my.zip.file <- sub("xlsx", "zip", my.excel.file)
    file.copy(from = my.excel.file, to = my.zip.file)
    
    # unzip the file
    unzip(my.zip.file)
    
    # unzipping produces a bunch of files which we can read using the XML package
    # assume sheet1 has our data
    xml <- xmlParse("xl/worksheets/sheet1.xml")
    
    # finally grab the hyperlinks
    hyperlinks <- xpathApply(xml, "//x:hyperlink/@display", namespaces="x")
    

    Derived from this blogpost.