I am playing around with image classification using the tensorflow and keras packages for R. I have build and trained a model that does well on the testing validation dataset. I now want to use that model to predict image classes for a lot of images stored online (i have all the URLs in a dataframe in R).
I can write a for loop to do this where i download each image, classify it, record the model prediction, and then delete the downloaded image, but this takes a long time and it would be faster to just read the image into memory instead of downloading each image. I cannot for the life of me figure out how to load an imagine into memory in R and convert it to a datatype that works with the rest of my tensorflow image standardization.
Here is my for loop:
data$score<-NA
for (i in 1:nrow(data)){
img_tensor =
get_file("t",data$image_url[i]) %>% #download temp file
tf$io$read_file() %>%
tf$io$decode_image() %>%
tf$image$resize(as.integer(image_size)) %>%
tf$expand_dims(0L)
#delete temp file
file.remove("/Users/me/.keras/datasets/t")
data$score[i]=model %>% predict(img_tensor, verbose=0)
}
Here is an example image URL: https://inaturalist-open-data.s3.amazonaws.com/photos/451526093/medium.jpeg
All i want to do is be able to load that image into R directly from the URL (no writing the file to disk) and then use the tensorflow workflow (decode_image, resize, expand_dims). Any help is appreciated!
To replicate the code just replace data$image_url[i] with the URL i provided. No need to worry about predicting my model, that part is working fine. I just need the image to successfully feed into the rest of the pipe.
A few notes:
Writing to a temporary directory on macOS and Linux usually has identical performance to keeping everything in memory, since /tmp
is usually mounted as a RAM filesystem and never actually touches the disk. (If you're on Windows, or are swapping, the story is different)
As far as I know, TensorFlow doesn't have any graph ops that will fetch content from an http url, so you'll need to do that step using R or Python. If the op needs to live in a tf.data
, you'll need to wrap it in tf.py_function
.
To fetch a url directly into memory in R, without writing to the filesystem, you can do:
url <- "https://inaturalist-open- data.s3.amazonaws.com/photos/451526093/medium.jpeg"
bytes <- readBin(url, raw(), 200000)
as_py_bytes <- reticulate::import_builtins(convert = FALSE)$bytes
bytes_tensor <- tf$constant(as_py_bytes(bytes), tf$string)
The bottleneck is most likely the download step, not the "write to a file" step. You'll probably see the most significant speedups from rewriting your loop to process batches of images instead of a single image at a time (e.g., using curl::multi_download()
, and passing a batch of images to predict()
)