javaimagepdfpdfboximage-compression

Losing image resolution when converting PDF to image and back with PDFBox


I had a Java program written for me that would take a single page 36" high x 48" wide PDF file and would "slice" it into 3 single pages, 12x36, 24x36 and 12x36 of which I would then print.

Background for context: I sell Canva templates for science fairs, graduation photo collages, etc. that are mounted on 36 x 48" cardboard tri-folds. The templates in Canva are 36 x 48 and are exported as "PDF Print" at the same size. Once I get the PDF file I run it through the software as the goal of the software was to create 3 individual pages cut to the right size rather than a single large print that would be hard to glue to the trifold and be oversized for shipping.

The program converts the PDF to JPG, performs the "slicing" action, then converts it back to PDF for printing. The issue I am seeing is that the resolution decreases significantly during this process.

I am not a programmer by any means but I was able to open the program in Eclipse and examine the logic. I did some research and saw that the PDFBox tools is able to set the DPI of the JPG during conversion from PDF, which I did using some code from AI, however it did not help the resolution.

My son said to convert to PNG which is lossless and, again I had AI write me some code which also didn't seem to work.

The program I have has 4 .java files. The file that has the conversion code and I have been attempting to edit is called .separate.

Two questions:

  1. In order to compile this to test do I need all 4 files open in Eclipse when I export and create a runnable JAR file?

  2. Is there anything obvious in the code that is wrong?

Here is the code for the PNG option in .separate:

package abc;

import java.awt.image.BufferedImage;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import javax.imageio.ImageIO;
import javax.imageio.ImageWriteParam;
import javax.imageio.ImageWriter;
import javax.imageio.stream.ImageOutputStream;

import org.apache.pdfbox.Loader;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.rendering.PDFRenderer;

public class Separate {
    private File file;
    private static PDDocument pdf;
    private float lefta;
    private float leftb;
    private float centrea;
    private float centreb;
    private float righta;
    private float rightb;
    private int height;

    public Separate(File f, float[] nums) {
        file = f;
        lefta = nums[0];
        leftb = nums[1];
        centrea = nums[2];
        centreb = nums[3];
        righta = nums[4];
        rightb = nums[5];
    }

    public PDDocument Divide_pdf() throws IOException {
        pdf = Loader.loadPDF(file);

        BufferedImage image = convertto_png(pdf);

        height = image.getHeight();
        List<BufferedImage> images = operate(image);

        PDDocument proc_document = convertto_pdf(images);
        pdf.close();
        return proc_document;
    }

    public BufferedImage convertto_png(PDDocument doc) throws IOException {
        PDFRenderer renderer = new PDFRenderer(doc);
        
        // Set DPI (e.g., 600 DPI) for high-resolution rendering
        int dpi = 600;
        BufferedImage img = renderer.renderImageWithDPI(0, dpi);
        return img;
    }

    public List<BufferedImage> operate(BufferedImage image) throws IOException {
        lefta = lefta * 72;
        leftb = leftb * 72;
        centrea = centrea * 72;
        centreb = centreb * 72;
        righta = righta * 72;
        rightb = rightb * 72;
        BufferedImage img1 = image.getSubimage(((int) lefta), 0, ((int) (leftb - lefta)), image.getHeight());
        BufferedImage img2 = image.getSubimage(((int) centrea), 0, (((int) centreb) - ((int) centrea)), image.getHeight());
        BufferedImage img3 = image.getSubimage(((int) righta), 0, (((int) rightb) - ((int) righta) - 1), image.getHeight());
        List<BufferedImage> images = new ArrayList<BufferedImage>();
        images.add(img1);
        images.add(img2);
        images.add(img3);
        return images;
    }

    public PDDocument convertto_pdf(List<BufferedImage> files) throws IOException {
        List<byte[]> images = new ArrayList<byte[]>();
        for (BufferedImage image : files) {
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            
            // Save as PNG with high quality
            ImageWriter pngWriter = ImageIO.getImageWritersByFormatName("png").next();
            ImageWriteParam pngWriteParam = pngWriter.getDefaultWriteParam();
            
            ImageOutputStream ios = ImageIO.createImageOutputStream(baos);
            pngWriter.setOutput(ios);
            pngWriter.write(null, new javax.imageio.IIOImage(image, null, null), pngWriteParam);
            byte[] bytes = baos.toByteArray();
            images.add(bytes);
        }
        PDDocument document = new PDDocument();
        List<PDRectangle> rectangles = new ArrayList<PDRectangle>();
        PDRectangle rec = new PDRectangle(0, 0, ((int) leftb), height);
        PDRectangle rec1 = new PDRectangle(0, 0, (((int) centreb) - ((int) centrea)), height);
        PDRectangle rec2 = new PDRectangle(0, 0, (((int) rightb) - ((int) righta) - 1), height);
        rectangles.add(rec);
        rectangles.add(rec1);
        rectangles.add(rec2);
        int i = 0;
        for (byte[] f : images) {
            PDPage page = new PDPage();
            page.setMediaBox(rectangles.get(i));
            document.addPage(page);
            PDImageXObject img = PDImageXObject.createFromByteArray(document, f, "output");
            PDPageContentStream contentStream = new PDPageContentStream(document, page);
            contentStream.drawImage(img, 0, 0);
            contentStream.close();
            i++;
        }
        return document;
    }
}

Example Before

Example After pg 1

Example After pg 2 Example After pg 3


Solution

  • There is a cross-platform multi-language PDF API (C, C#, CSS, Python, JavaScript, but not yet Java ?) that is well suited to this task.

    Here I am simply using OS command lines with the prebuilt binary. Unlike many other methods the output PDF file size should be only slightly larger than the input PDF.

    Another advantage is it works well with both image pages or vector sourced PDF.
    https://github.com/coherentgraphics

    To check the input file width on Windows use cpdf.exe - info filename so for my example that includes MediaBox: 0.000000 0.000000 3464.000000 2598.000000

    Thus my example width is 3464. Then divide by 4 and also multiply by 3 giving 866 and 2598.

    pdf\cpdf>cpdf -chop-v 866 in.pdf -o 2bits.pdf & cpdf -chop-v 2598 2bits.pdf 2 -o 3bits.pdf
    

    For a batch file use a temp file such as 2bits for first pass:

    To drag and drop enter image description hereor call with a filename use %~1 splitter "my input.pdf"

    "C:\your path\ to \cpdf.exe" -chop-v 866 "%~1" -o "%temp%\2bits.pdf"
    "C:\your path\ to \cpdf.exe" -chop-v 2598 "%temp%\2bits.pdf" 2 -o "%~dpn1-splitted.pdf"
    del "%temp%\2bits.pdf"
    

    Result enter image description here

    enter image description here