Below, I will briefly describe the problem:
First, I convert a DOC file to a MHTML file (Single webpage file) . Second, I want to convert this MHTML file to a pdf file. But I find that the content of the header and the footer are lost in the new pdf file.
My code is as follows:
Document doc = new Document("XXX.mht");
doc.save("XXX1.pdf", SaveFormat.PDF);
The toolkit used is aspose.jar;
Does anyone know the reason? Thank you very much for your answer.
I found that the header in the mht file is written like this:
<div style=3D'mso-element:header' id=3Dh1>
<div style=3D'mso-element:para-border-div;border:none;border-bottom:solid w=
indowtext 1.0pt;
mso-border-bottom-alt:solid windowtext .75pt;padding:0cm 0cm 1.0pt 0cm'>
<p class=3DMsoHeader><span lang=3DEN-US>This is the header</span></p>
</div>
</div>
But when I convert it to a PDF file, this content is gone.
When I open a WORD file with Office, I can see the header content, but when I convert it to PDF using Aspose, I can't see it anymore.
MS Word and HTML/MHTML documents are quite different by their structure and it is impossible to provide 100% fidelity after conversion one to another. It is hard to meaningfully output headers and footers to HTML/MHTML because HTML is not paginated. If you are using Aspose.Words to convert DOC to MHTML, by default Aspose.Words exports only primary headers/footers of the document per section when saving to HTML/MHTML. So the only way to preserve the original DOC document formatting is direct conversion DOC to PDF.