imagemagickminimagickvips

Compose lots of images at once in imagemagick Ruby


I have the following code which takes a PDF file and composes it into a single jpg image which has a horizontal black line between each PDF page image, stacking the PDF pages.

image = MiniMagick::Image.open(pdf_file)

# create a new blank file which we will use to build a composite image
# containing all of our pages
MiniMagick::Tool::Convert.new do |i|
  i.size "#{image.width}x#{image.layers.size * image.height}"
  i.stroke "black"

  image.layers.count.times.each do |ilc|
    next if ilc.zero?

    top = ilc * (image.height + 1)
    i.draw "line 0,#{top}, #{image.width},#{top}"
  end

  i.xc "white"
  i << image_file_name
end

composite_image = MiniMagick::Image.open(image_file_name)

# For each pdf page, add it to our composite image. We add one so that we
# don't put the image over the 1px black line that was added to separate
# pages.
image.layers.count.times do |i|
  composite_image = composite_image.composite(image.layers[i]) do |c|
    c.compose "Over" # OverCompositeOp
    c.geometry "+0+#{i * (image.height + 1)}"
  end
end

composite_image.format(format)
composite_image.quality(85)
composite_image.write(image_file_name)

It works perfectly, except a 20 page PDF file takes three minutes. I'm looking for a better way to do this. I suspect one of these two options will work:

  1. Compose all of the PDF page images at once, although I haven't figured out how to do that.
  2. Use vips, thanks to its pipeline implementation.

I would rather stay with imagemagick, but I am open to either way. I'm looking for pointers how to achieve what I am looking for.


Solution

  • I had a stab at a ruby-vips version:

    require 'vips'
    
    # n: is the number of pages to load, -1 means all pages in tall, thin image
    image = Vips::Image.pdfload ARGV[0], n: -1
    
    # we can get the number of pages and the height of each page from the metadata
    n_pages = image.get 'pdf-n_pages'
    page_height = image.get 'page-height'
    
    # loop down the image cutting it into an array of separate pages
    pages = (0 ... n_pages).map do |page_number|
      image.crop(0, page_number * page_height, image.width, page_height)
    end 
    
    # make a 50-pixel-high black strip to separate each page
    strip = Vips::Image.black image.width, 50
    
    # and join the pages again
    image = pages.inject do |acc, page|
      acc.join(strip, 'vertical').join(page, 'vertical')
    end 
    
    image.write_to_file ARGV[1]
    

    On this desktop with this 58 page PDF I see:

    $ /usr/bin/time -f %M:%e ruby ./pages.rb nipguide.pdf x.jpg
    152984:1.08
    $ vipsheader x.jpg
    x.jpg: 595x50737 uchar, 3 bands, srgb, jpegload
    

    So it makes a 50,000 pixel high jpg in about 1.1 seconds and needs a peak of 150 mb of memory.

    I tried fmw42's clever imagemagick line:

    $ /usr/bin/time -f %M:%e convert nipguide.pdf -background black -gravity south -splice 0x50 -append x.jpg
    492244:5.16
    

    so 500 mb of memory and 5.2s. It makes an image almost exactly the same size.

    The speed difference is mostly the PDF rendering library, of course: IM shells out to ghostscript, whereas ruby-vips calls poppler or PDFium directly. libvips is able to stream this program, so during evaluation it never has more than one page in memory at once.

    JPG has a limit of 65535 pixels in any axis, so you won't be able to get much larger than this. For shorter documents, you could add dpi: 300 to the PDF load to get more detail. The default is 72 dpi.

    You should get nice text quality without having to render at high resolution. For example, for the PDF linked above, if I run:

    $ vips pdfload nipguide.pdf x.png --page 12
    

    To render page 12 at the default 72 dpi, I get:

    enter image description here