I am trying to use Tesseract in R to scrape data from an image, however I get an error. This is the code I am using and the error:
library(tesseract)
eng <- tesseract("eng")
text <- tesseract::ocr("https://cdn.who.int/media/images/default-source/emergencies/disease-outbreak-news/table19f24bf8a-7733-400f-abaa-150c481f876a.jpg", engine = eng)
Wrong JPEG library version: library is 90, caller expects 80
Error in pixReadStreamJpeg: internal jpeg error
Error in pixReadStream: jpeg: no pix returned
Error in pixRead: pix not read
Error in FUN(X[[i]], ...) : Failed to read image
I have tried looking up the problem but almost everything on stackoverflow refers to linux and I need to get this to work on Windows 10. Any help appreciated!
I don't know why this worked, but just downloading the image using magick::image_read and passing that result into tesseract::ocr (instead of passing the URL directly) worked for me.