pdfmupdfstructured-text

Get mutool to output "structured text (as xml)"


Following mutool's instructions for the draw command
https://mupdf.com/docs/manual-mutool-draw.html

How do I output "structured text (as xml)" when one of the output "vector formats" is "debug trace (as xml)" and the "output format is inferred from the output filename" ?

If I run

mutool draw -o "testfile.xml" "testfile.pdf"

it appears that I get the "debug trace (as xml)" file format.

What file extension should I use to ensure that the "structured text (as xml)" format is output?


Solution

  • The usage message if you run "mutool draw" with no arguments tells you which formats are supported, and what their file extensions are.

    In your case, you want "stext" output.

    mutool draw -o out.stext input.pdf
    mutool draw -F stext -o out.xml input.pdf
    

    Or if you prefer the "mutool convert" command, which supports advanced output options using the -O argument.

    mutool convert -o out.stext input.pdf