javaasposemhtml

How to preserve header content when converting MHTML files to PDF files using Java?


Below, I will briefly describe the problem:

First, I convert a DOC file to a MHTML file (Single webpage file) . Second, I want to convert this MHTML file to a pdf file. But I find that the content of the header and the footer are lost in the new pdf file.

My code is as follows:

Document doc = new Document("XXX.mht");
doc.save("XXX1.pdf", SaveFormat.PDF);

The toolkit used is aspose.jar;

Does anyone know the reason? Thank you very much for your answer.

I found that the header in the mht file is written like this:

<div style=3D'mso-element:header' id=3Dh1>

<div style=3D'mso-element:para-border-div;border:none;border-bottom:solid w=
indowtext 1.0pt;
mso-border-bottom-alt:solid windowtext .75pt;padding:0cm 0cm 1.0pt 0cm'>

<p class=3DMsoHeader><span lang=3DEN-US>This is the header</span></p>

</div>

</div>

But when I convert it to a PDF file, this content is gone.

When I open a WORD file with Office, I can see the header content, but when I convert it to PDF using Aspose, I can't see it anymore.


Solution

  • MS Word and HTML/MHTML documents are quite different by their structure and it is impossible to provide 100% fidelity after conversion one to another. It is hard to meaningfully output headers and footers to HTML/MHTML because HTML is not paginated. If you are using Aspose.Words to convert DOC to MHTML, by default Aspose.Words exports only primary headers/footers of the document per section when saving to HTML/MHTML. So the only way to preserve the original DOC document formatting is direct conversion DOC to PDF.