javapdfitextpdf-renderingpdf-manipulation

Retrieve the page number of an image in pdf- IText


I am using the code from the below link to render the images

MyImageRenderListener - IText

Below is my try block of the Code. What I am actually doing is finding the DPI of the image and if the dpi of the image is below 300 then writing it in a text file.

NOW, I also want to write the page numbers where these images are located in the PDF. How can I obtain the Page Number of that image?

    try {
            String filename;
            FileOutputStream os;
            PdfImageObject image = renderInfo.getImage();
            BufferedImage img = null;
            String txtfile = "results/results.txt";
            PdfDictionary imageDict = renderInfo.getImage().getDictionary();
            float widthPx = imageDict.getAsNumber(PdfName.WIDTH).floatValue(); 
            float heightPx = imageDict.getAsNumber(PdfName.HEIGHT).floatValue();
            float widthUu = renderInfo.getImageCTM().get(Matrix.I11);
            float heigthUu = renderInfo.getImageCTM().get(Matrix.I22);
            float widthIn = widthUu/72;
            float heightIn = heigthUu/72;
            float imagepdi = widthPx/widthIn;
            filename = String.format(path, renderInfo.getRef().getNumber(), image.getFileType());
            System.out.println(filename+"-->"+imagepdi);
            if(imagepdi < 300){
                File file = new File("C:/Users/Abhinav/workspace/itext/results/result.txt");



                if(filename != null){
                    if (!file.exists()) {
                        file.createNewFile();
                    }

                    FileWriter fw = new FileWriter(file.getAbsoluteFile(),true);
                    file.setReadable(true, false);
                    file.setExecutable(true, false);
                    file.setWritable(true, false);
                    BufferedWriter bw = new BufferedWriter(fw);
                    bw.write(filename);
                    bw.write("\r\n");
                    bw.close();
                }
            }

Solution

  • This is a strange question, because it is incomplete and illogical.

    Why is your question incomplete?

    You are using MyImageRenderListener in the context of another example, ExtractImages:

    PdfReader reader = new PdfReader(filename);
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);
    MyImageRenderListener listener = new MyImageRenderListener(RESULT);
    for (int i = 1; i <= reader.getNumberOfPages(); i++) {
        parser.processContent(i, listener);
    }
    reader.close();
    

    In this example, you loop over every page number to examine every separate page. Hence you know the page number whenever MyImageRenderListener returns an image.

    Images are stored inside a PDF as external objects (aka XObject). MyImageRenderListener returns what's stored in such a stream object (containing the bytes of the image). So far, so good.

    Why is your question illogical?

    Because the whole purpose of storing images in XObject is to be able to reuse the same image stream. Imagine an image of a logo. That image can be present on every page of the document. In this case, MyImageRenderListener will give you the same image (from the same stream) as many times as there are pages, but in reality, there is only one image, and it's external to the page content. It doesn't make sense for that image to "know" the page it is on: it is on every page. The same logic applies even when the image is only used on one page. That is inherent to the design of PDF: an image stream doesn't know which page it belongs to. The link between the image stream and the page exists through the /XObject entry in the /Resources of the page dictionary.

    What would be an elegant way to solve this?

    Create a member-variable in MyImageRenderListener, e.g.:

    protected int pagenumber;
    
    public void setPagenumber(int pagenumber) {
        this.pagenumber = pagenumber;
    }
    

    Use the setter from your loop:

    PdfReader reader = new PdfReader(filename);
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);
    MyImageRenderListener listener = new MyImageRenderListener(RESULT);
    for (int i = 1; i <= reader.getNumberOfPages(); i++) {
        listener.setPagenumber(i);
        parser.processContent(i, listener);
    }
    reader.close();
    

    Now you can use pagenumber in the renderImage(ImageRenderInfo renderInfo) method. This way, you'll always know which page is being examined when this method is triggered.