xsltxslt-2.0xsl-foantenna-house

How to return the (total) pagecount of external PDF files via XSL


Is it possible to return the total page count of an external PDF file via XSL? Does the AntennaHouse Formatter have an equivalent extention?

Thanks in advance!


Solution

  • If you are using Java based XSLT processor which allows external function call (such as Saxon PE or EE), then Apache PDFBox will help you.

    PDFBox: https://pdfbox.apache.org/

    PDFBox’s PDDocument class has the method that returns page count of the target PDF. So you can get page count by following step:

    1. Write Java class and static method.
    2. Call it from XSLT styleshhet.

    [Java sample code]

    package com.acme.pdfutil;
    import java.io.File;
    import org.apache.pdfbox.pdmodel.PDDocument;
    public class pdfDocument {
        /**
         * Get the page count of specified PDF file.
         * @param filePath 
         * @return Page count
         */
        public static int getPageCount(String filePath){
            File pdfFile = null;
            PDDocument pdfDoc = null;
            int pageCount = -1;
            try {
                pdfFile = new File(filePath);
                pdfDoc = PDDocument.load(pdfFile);
                pageCount = pdfDoc.getNumberOfPages();
            }
            catch (Exception e) {
                System.out.println("[getPageCount] " + e.getMessage());
            }
            finally {
                if (pdfDoc != null){
                    try{
                        pdfDoc.close();
                    }
                    catch (Exception e) {
                        ;
                    }
                }
            }
            return pageCount;
        }
    }
    

    [XSLT stylesheet]

    <xsl:stylesheet version="2.0" 
     xmlns:fo="http://www.w3.org/1999/XSL/Format" 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:xs="http://www.w3.org/2001/XMLSchema"
     xmlns:acmejava="java:com.acme.pdfutil.pdfDocument"
    >
    …
    <!-- Call external function -->
    <xsl:variable name=”pdfPageCount” as="xs:integer" select="acmejava:getPageCount($pdfPath)"/>
    …