pdftextextractmupdfxpdf

how to get specified text pos through xpdf or mupdf?


I want to extract some specified text in pdf files and the text position.

I know xpdf and mupdf can parse pdf files,so i think they may help me to fulfill this task.

But how to use these two lib to get text position?


Solution

  • Mupdf comes with a couple of tools, one being pdfdraw.

    If you use pdfdraw with the -tt option, it will generate an XML containing all characters and their exact positioning information.
    From there you should be able to find what you need.