objective-ccocoapdfquartz-graphicspoppler

How to figure out the resolution (DPI) of images embedded in a PDF document?


I have a PDF document that also contains images.

Now I want to know the resolution of these images.

A first step would be to somehow get the images out of the PDF document. But how?

Is that even possible with something provided in Cocoa?


Solution

  • Have a look at this answer for your other question:

    Basically, you can now use the (new) -list parameter for Poppler's pdfimages commandline utility (it will NOT work for XPDF's version of pdfimages!).

    It will report the dimensions of each image appearing on the queried pages.

    (You can also use it to extract images from a PDF: pdfimages -png -f 3 -l 5 some.pdf prefix--- will extract all images as PNGs from the PDF file, starting with first page 3 and ending with last page 5, using a filename prefix of prefix--- for each image. But this problem seems to not be the main focus of your question...)

    Example:

    pdfimages -list -f 1 -l 3 /Users/kurtpfeifle/Downloads/ct-magazin-14-2012.pdf
    
      page   num  type   width height color comp bpc  enc interp  object ID
      ---------------------------------------------------------------------
         1     0 image    1247  1738  rgb     3   8  jpx    no      3053  0
         2     1 image     582   839  gray    1   8  jpeg   no      2080  0
         2     2 image     344   364  gray    1   8  jpx    no      2079  0
         3     3 image     581   838  rgb     3   8  jpeg   no         7  0
         3     4 image    1088   776  rgb     3   8  jpx    no         8  0
         3     5 image       6     6  rgb     3   8  image  no         9  0
         3     6 image       8     6  rgb     3   8  image  no        10  0
         3     7 image       4     6  rgb     3   8  image  no        11  0
         3     8 image     212   106  rgb     3   8  jpx    no        12  0
         3     9 image     150    68  rgb     3   8  jpx    no        13  0
         3    10 image       6     6  rgb     3   8  image  no        14  0
         3    11 image       4     4  rgb     3   8  image  no        15  0
    

    It does not directly report the DPI resolution -- but from the 'width' and 'height' dimensions you can calculate it easily: you measure the width of the picture on your screen with an inch ruler and then divide the 'width pixels' by the measured ruler number...

    You find this strange, because the result is dependent on your current zoom level? Yes, it is!

    The concept of the 'resolution' is always dependent on the environment. A so-called 'hi-res' picture basically always has lots of pixels in width and height. This allows for better quality (or 'resolution') if the picture needs to be displayed or printed with higher zoom levels.


    Update

    Meanwhile there is a new version of (Poppler's) pdfimages:

    $  pdfimages -version
      pdfimages version 0.33.0
      [....]
    

    This reports the resolution of embedded images as well, in PPI (pixels per inch), in horizontal (x-ppi) and vertical (y-ppi) directions:

    page num  type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
    -------------------------------------------------------------------------------------
       1   0 image  1247  1738  rgb     3   8  jpx    no    3053 0   151   151  228K 3.6%
       2   1 image   582   839  gray    1   8  jpeg   no    2080 0    72    72  319B 0.1%
       2   2 image   344   364  gray    1   8  jpx    no    2079 0   150   150 4325B 3.5%
       3   3 image   581   838  rgb     3   8  jpeg   no       7 0    73    73 1980B 0.1%
       3   4 image  1088   776  rgb     3   8  jpx    no       8 0   150   151  106K 4.3%
       3   5 image     6     6  rgb     3   8  image  no       9 0   150   150  108B 100%
       3   6 image     8     6  rgb     3   8  image  no      10 0   150   150  158B 110%
       3   7 image     4     6  rgb     3   8  image  no      11 0   150   150   73B 101%
       3   8 image   212   106  rgb     3   8  jpx    no      12 0   150   150 2396B 3.6%
       3   9 image   150    68  rgb     3   8  jpx    no      13 0   150   150 1878B 6.1%
       3  10 image     6     6  rgb     3   8  image  no      14 0   150   150   81B  75%
       3  11 image     4     4  rgb     3   8  image  no      15 0   150   150   50B 104%
    

    This new feature appeared first in Poppler version 0.25 (released Wed December 11, 2013). It additionally reports...

    ...of embedded images.

    Limitations of pdfimages -list

    Perhaps I should also make you aware of the limitations of the pdfimages utility, and give an example where its output report is not completely correct.

    One example is this handcoded PDF from my (recently created) GitHub repository of PDFs to help beginners to study the syntax of PDF source code.

    I originally created this PDF in order to demonstrate a bug with Mozilla's PDF.js renderer. Here is a screenshot about how it looks in PDF.js (left) and how it should look when rendered correctly (right, rendered by Ghostscript and Adobe Reader):

     

    (Right-click on each of above images. Select "Open image in new tab" to see the exact differences...")


    The PDF file contains a 2x2 pixels image, embedded only once (with object ID 5 0), but displayed on the page multiple times with different settings, where each time the image is placed...

    Under these extreme circumstances pdfimages -list falls flat on its nose when trying to determine some of the resolutions for instances of this image:

    page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
    ------------------------------------------------------------------------------------
       1   0 image    2     2  rgb     3   8 image  no        5 0     4     4   13B 108%
       1   1 image    2     2  rgb     3   8 image  no        5 0     5     3   13B 108%
       1   2 image    2     2  rgb     3   8 image  no        5 0     3     5   13B 108%
       1   3 image    2     2  rgb     3   8 image  no        5 0     6     3   13B 108%
       1   4 image    2     2  rgb     3   8 image  no        5 0     3    10   13B 108%
       1   5 image    2     2  rgb     3   8 image  no        5 0     4 72000   13B 108%
       1   6 image    2     2  rgb     3   8 image  no        5 0     4     2   13B 108%
       1   7 image    2     2  rgb     3   8 image  no        5 0     2     4   13B 108%
       1   8 image    2     2  rgb     3   8 image  no        5 0 14401     1   13B 108%
       1   9 image    2     2  rgb     3   8 image  no        5 0     1     2   13B 108%
       1  10 image    2     2  rgb     3   8 image  no        5 0 0.950     4   13B 108%
       1  11 image    2     2  rgb     3   8 image  no        5 0     4 0.950   13B 108%
       1  12 image    2     2  rgb     3   8 image  no        5 0 0.950     4   13B 108%
       1  13 image    2     2  rgb     3   8 image  no        5 0     1     4   13B 108%
       1  14 image    2     2  rgb     3   8 image  no        5 0 0.950     4   13B 108%
       1  15 image    2     2  rgb     3   8 image  no        5 0 0.950     4   13B 108%
       1  16 image    2     2  rgb     3   8 image  no        5 0     4 0.950   13B 108%
    

    pdfimages -list gets most values correct, if there is no rotation and/or no skewing involved. It is no wonder that there are discrepancies if the image is rotated or skewed: Because how would you even reliably define an x-ppi and y-ppi value for such cases? That explains the (completely wrong) values of 72000 y-ppi for image no. 5 and 14401 x-ppi for image no. 8.

    As you can easily see, pdfimages is rather clever for determining other image properties:

    1. It correctly reports the same object ID 5 0 for all instances of the displayed image, indicating that this image is embedded once, but displayed multiple times on the page.
    2. It correctly reports the image dimensions to be 2x2 pixels.