rtry-catchrmagick

How to get total number of pages of pdf files using magick::image_read_pdf?


Let's say under one folder main_path, we have multiple pdf files with different amount of pages, I use the function below to loop all files and screenshot each pages:

library(magick)
library(glue)

main_path <- './'

file_names <- list.files(path = main_path, pattern ='.pdf') 
file_paths <- file.path(main_path, file_names)
file_names_no_ext <- tools::file_path_sans_ext(file_names)

max_page <- 10
pdf2plot <- function(file_path, file_names_no_ext){
  pages <- magick::image_read_pdf(file_path)
  print(pages)
  num <- seq(1, max_page, 1)
  # num <- seq(1, nrow(data.frame(pages)), 1)
  for (i in num){
    pages[i] %>% image_write(., path = paste0(glue(main_path, '/plot/', {file_names_no_ext},
                                                   sprintf('_%02d.', i)), format = "png"))
  }
}

mapply(pdf2plot, file_paths, file_names_no_ext)

The problem I met is if we have one file in folder with total number of pages less than max_page, it will raise an Error in magick_image_subset(x, i) : subscript out of bounds. For example, I have one file with 2 pages, but I set max_page=10, I will get this error.

The content of pages:

  format width height colorspace matte filesize density
  <chr>  <int>  <int> <chr>      <lgl>    <int> <chr>  
1 PNG     2250   3000 sRGB       TRUE         0 300x300
2 PNG     2250   3000 sRGB       TRUE         0 300x300
3 PNG     2250   3000 sRGB       TRUE         0 300x300
4 PNG     2250   3000 sRGB       TRUE         0 300x300
5 PNG     2250   3000 sRGB       TRUE         0 300x300
6 PNG     2250   3000 sRGB       TRUE         0 300x300
7 PNG     2250   3000 sRGB       TRUE         0 300x300
8 PNG     2250   3000 sRGB       TRUE         0 300x300
9 PNG     2250   3000 sRGB       TRUE         0 300x300
Error in magick_image_subset(x, i) : subscript out of bounds
Called from: magick_image_subset(x, i)

I think there could be two ways to solve this problem, but I don't how to do it yet: 1. use try-catch, 2. replace max_page by get total number of pages using magick::image_read_pdf.

Thanks for your help at advance.


Solution

  • If you look at the documentation of ?image_read, we can see that:

    All standard base vector methods such as [, [[, c(), as.list(), as.raster(), rev(), length(), and print() can be used to work with magick image objects. Use the standard img[i] syntax to extract a subset of the frames from an image.

    So you can simply use length(pages) to get the number of pages for that document. Here's a simple version of your function using lapply(). I think you can simplify your pathing a lot, but won't get into that.

    library(magick)
    library(glue)
    
    pdf2plot <- function(file_path, file_names_no_ext){
      pages <- magick::image_read_pdf(file_path)
      lapply(
        1:length(pages),
        \(i) image_write(pages[i], path = paste0(glue(main_path, '/plot/', {file_names_no_ext},
                                                       sprintf('_%02d.', i)), format = "png"))
      )
    }
    

    Code produced using R 4.1.0