rcsvpdfpdftoolstabulizer

PDF conversion to CSV R


I am trying to load the following PDF into R, and convert the table into a CSV file.

I have tried both the library(pdftools) and library(tabulizer), & I have spent an afternoon going through various forums, but I do not seem to find an answer that works for me. I can load the PDF to R using the following code

x <- pdf_text("~/Desktop/PlantTraitAsia.pdf")

It uploads just fine, but it is not at all a table of table I can work with.

Here is a link to the PDF file:

http://vege1.kan.ynu.ac.jp/traits/PlantTraitAsia.pdf

I would simply like to upload the table into R, keep the header, and be able to export it to a TXT, CSV, or XLS file.

Thanks for your help


Solution

  • This works well on my machine:

    zz <- tabulizer::extract_tables("http://vege1.kan.ynu.ac.jp/traits/PlantTraitAsia.pdf", pages = 2)
    head(zz[[1]])
    

    This produces:

         [,1]  [,2]                      [,3]                    
    [1,] "ID"  "Category\rof\rpermissio" "Species"               
    [2,] "83"  "A"                       "Abies mariesii Masters"
    [3,] "155" "A"                       "Abies mariesii Masters"
    [4,] "225" "A"                       "Abies mariesii Masters"
    [5,] "297" "A"                       "Abies mariesii Masters"
    [6,] "369" "A"                       "Abies mariesii Masters"
         [,4]                                                                            [,5]         [,6]   
    [1,] "Traits"                                                                        "Value"      "Notes"
    [2,] "Maximum heighyt (m)"                                                           "18.17"      ""     
    [3,] "Shade tolerance (min. relative\rlight intensity, %), Anderson\r1964. J. Ecol." "1.15"       ""     
    [4,] "Length of fruit (mm)"                                                          "8"          ""     
    [5,] "Pollination mode"                                                              "Anemophily" ""     
    [6,] "Type of fruit"                                                                 "Wing-hair"  ""     
    

    To get only headers (first line of your table):

    zz[[1]][1,]