javapdfsejda

Merging PDFs with Sejda fails with stream output


Using Sejda 1.0.0.RELEASE, I basically followed the tutorial for splitting a PDF but tried merging instead (org.sejda.impl.itext5.MergeTask, MergeParameters, ...). All works great with the FileTaskOutput:

parameters.setOutput(new FileTaskOutput(new File("/some/path/merged.pdf")));

However I am unable to change this to StreamTaskOutput correctly:

OutputStream os = new FileOutputStream("/some/path/merged.pdf");
parameters.setOutput(new StreamTaskOutput(os));
parameters.setOutputName("merged.pdf");

No error is reported, but the resulting file cannot be read by Preview.app and is approximately 31 kB smaller (out of the ~1.2 MB total result) than the file saved above.

My first idea was: stream is not being closed properly! So I added os.close(); to the end of CompletionListener, still the same problem.

Remarks:


Edit

Turns out, the reason is that StreamTaskOutput zips the result into a ZIP file! OutputWriterHelper.copyToStream() is the culprit. If I rename merged.pdf to merged.zip, it's a valid ZIP file containing a perfectly valid merged.pdf file!

Could anyone (dear authors of the library) comment on why this is happening?


Solution

  • The idea is that when a task consumes a MultipleOutputTaskParameters producing multiple output documents, the StreamTaskOutput has to group them to be able to write all of them to a stream output. Unfortunately Sejda currently applies the same logic to SingleOutputTaskParameters, hence your issue. We can fix this in Sejda 2.0 because it makes more sense to directly stream the out document in case of SingleOutputTaskParameters. For Sejda 1.x I'm not sure how to address this remaining compatible with the existing behaviour.