I want to extract some specified text in pdf files and the text position.
I know xpdf and mupdf can parse pdf files,so i think they may help me to fulfill this task.
But how to use these two lib to get text position?
Mupdf comes with a couple of tools, one being pdfdraw
.
If you use pdfdraw with the -tt
option, it will generate an XML
containing all characters and their exact positioning information.
From there you should be able to find what you need.