I have seen how to do this in previous versions like below:
How to extract font styles of text contents using pdfbox?
But I think the getFonts() method has been removed now. I want to retrieve a map of texts to fonts (Map<String, PDFont>
) in the new version of PDFBox but I have no idea how.
Thanks
Kabeer
Do this:
PDDocument doc = PDDocument.load("C:/mydoc3.pdf");
for (int i = 0; i < doc.getNumberOfPages(); ++i)
{
PDPage page = doc.getPage(i);
PDResources res = page.getResources();
for (COSName fontName : res.getFontNames())
{
PDFont font = res.getFont(fontName);
// do stuff with the font
}
}
Note that this is a very simple solution of the problem, it only works on the top level. There could be more fonts in xobject forms, patterns, softmasks, and possibly more.