r

Read table from a website into R Studio and create a dataframe with the info


I'm working on project that requires a table I found on a website to be read into R studio and formatted so I can create graphical representations of the data. I have been attempting to do this via the magick package using the image_read and image_display functions. I continue to get the following error after I attempt to display the image:

Error: ImageMagick was built without X11 support

and no real output in my dataframe object.

Here is what I most recently tried to get this to work:

img <- image_read("https://i2.wp.com/www.brookings.edu/wp-content/uploads/2022/01/Table-2.png?w=768&crop=0%2C0px%2C100%2C9999px&ssl=1",
                  density = "300")
image_display(img)

img_data <- image_data(img)
table_df <- data.frame(img_data)
table_df

This returns the following error:

Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class ‘c("bitmap", "rgb")’ to a data.frame

I need it to return the data from the image as a dataframe that I can then manipulate for different graphical representations.


Solution

  • I was able to solve this using the tesseract package in R using a pdf version of the table.

    library(tesseract)
    
    # Load the PDF file and convert to text
    pdf_file <- "C:/Users/Table_Q2.pdf"
    text <- tesseract::ocr(pdf_file)