javapdffontspdfbox

How to extract fonts from PDDocument in PDFBox 2.0.2


I have seen how to do this in previous versions like below:

How to extract font styles of text contents using pdfbox?

But I think the getFonts() method has been removed now. I want to retrieve a map of texts to fonts (Map<String, PDFont>) in the new version of PDFBox but I have no idea how.

Thanks

Kabeer


Solution

  • Do this:

    PDDocument doc = PDDocument.load("C:/mydoc3.pdf");
    for (int i = 0; i < doc.getNumberOfPages(); ++i)
    {
        PDPage page = doc.getPage(i);
        PDResources res = page.getResources();
        for (COSName fontName : res.getFontNames())
        {
            PDFont font = res.getFont(fontName);
            // do stuff with the font
        }
    }
    

    Note that this is a very simple solution of the problem, it only works on the top level. There could be more fonts in xobject forms, patterns, softmasks, and possibly more.