I have the following code which takes a PDF file and composes it into a single jpg image which has a horizontal black line between each PDF page image, stacking the PDF pages.
image = MiniMagick::Image.open(pdf_file)
# create a new blank file which we will use to build a composite image
# containing all of our pages
MiniMagick::Tool::Convert.new do |i|
i.size "#{image.width}x#{image.layers.size * image.height}"
i.stroke "black"
image.layers.count.times.each do |ilc|
next if ilc.zero?
top = ilc * (image.height + 1)
i.draw "line 0,#{top}, #{image.width},#{top}"
end
i.xc "white"
i << image_file_name
end
composite_image = MiniMagick::Image.open(image_file_name)
# For each pdf page, add it to our composite image. We add one so that we
# don't put the image over the 1px black line that was added to separate
# pages.
image.layers.count.times do |i|
composite_image = composite_image.composite(image.layers[i]) do |c|
c.compose "Over" # OverCompositeOp
c.geometry "+0+#{i * (image.height + 1)}"
end
end
composite_image.format(format)
composite_image.quality(85)
composite_image.write(image_file_name)
It works perfectly, except a 20 page PDF file takes three minutes. I'm looking for a better way to do this. I suspect one of these two options will work:
I would rather stay with imagemagick, but I am open to either way. I'm looking for pointers how to achieve what I am looking for.
I had a stab at a ruby-vips version:
require 'vips'
# n: is the number of pages to load, -1 means all pages in tall, thin image
image = Vips::Image.pdfload ARGV[0], n: -1
# we can get the number of pages and the height of each page from the metadata
n_pages = image.get 'pdf-n_pages'
page_height = image.get 'page-height'
# loop down the image cutting it into an array of separate pages
pages = (0 ... n_pages).map do |page_number|
image.crop(0, page_number * page_height, image.width, page_height)
end
# make a 50-pixel-high black strip to separate each page
strip = Vips::Image.black image.width, 50
# and join the pages again
image = pages.inject do |acc, page|
acc.join(strip, 'vertical').join(page, 'vertical')
end
image.write_to_file ARGV[1]
On this desktop with this 58 page PDF I see:
$ /usr/bin/time -f %M:%e ruby ./pages.rb nipguide.pdf x.jpg
152984:1.08
$ vipsheader x.jpg
x.jpg: 595x50737 uchar, 3 bands, srgb, jpegload
So it makes a 50,000 pixel high jpg in about 1.1 seconds and needs a peak of 150 mb of memory.
I tried fmw42's clever imagemagick line:
$ /usr/bin/time -f %M:%e convert nipguide.pdf -background black -gravity south -splice 0x50 -append x.jpg
492244:5.16
so 500 mb of memory and 5.2s. It makes an image almost exactly the same size.
The speed difference is mostly the PDF rendering library, of course: IM shells out to ghostscript, whereas ruby-vips calls poppler or PDFium directly. libvips is able to stream this program, so during evaluation it never has more than one page in memory at once.
JPG has a limit of 65535 pixels in any axis, so you won't be able to get much larger than this. For shorter documents, you could add dpi: 300
to the PDF load to get more detail. The default is 72 dpi.
You should get nice text quality without having to render at high resolution. For example, for the PDF linked above, if I run:
$ vips pdfload nipguide.pdf x.png --page 12
To render page 12 at the default 72 dpi, I get: