python-3.xbashsubprocesswindows-subsystem-for-linuxpdf-to-html

Write output of pdftohtml to stdout


I'd like to run pdftohtml for a pdf file and write its output to /dev/stdout or something that permits me to catch output direct from subprocess.

My code:

cmd = ['pdftohtml', '-c', '-s', '-i', '-fontfullname', filename, '-stdout', '/dev/stdout']

result = subprocess.run(cmd, stdout=PIPE, stderr=STDOUT, text=True)

The code above exits with code -11.

I'm running it with Ubuntu 18.04 inside WSL 2.

I've tried to execute the same command in bash:

[1]    14041 segmentation fault (core dumped)  pdftohtml -c -s -i -fontfullname  -stdout /dev/stdout

It's also not possible to pass "-" to stdout value.

What can I do to get html output direct from subprocess.run?

I know it's possible to pipe cat and output filename to command, but it's not what I looking for.

The solution must be compatible with WSL2 and python stretch docker image. However, any clarification would be helpful : )


Solution

  • "Complex output mode", -c, specifies output using frames. This only works when writing to files.

    If you want to write to stdout, stick to only -s without -c -- and leave out /dev/stdout as an argument ("stdout" is a pre-opened file descriptor; because it's already opened, there's no reason to use a name to open it, so -stdout is a flag-type option, rather than an option that takes an option-argument).