I'm building an application that uses the NodeJS library docx
to "patch" a MS Word document and send it to the client. In a prior revision, I had the docx
library output a buffer, then I would use Libreoffice via command line to convert the document to a PDF. This seemed to work flawlessly, everytime.
My client decided they would rather just have the application output the MS Word document (docx) so they could make minor modifications as needed. I modified the code to download the MS Word document, but Word sees the document as corrupt each time I try to open it.
Trying to figure out why, I opened the docx up with 7zip, and began to examine the document.xml
file inside. Everything seems fine, so I began to comment some of the XML out to try to find the issue.
There are tables in the document, and what I am noticing is that MS Word doesn't like it when I have a paragraph (w:p) inside a table cell (w:tc). The document opens fine when the code below is commented, but when I uncomment it, I get the standard "Word experienced an error trying to open the file. Try these suggestions...".
<w:tc>
<w:tcPr>
<w:tcW w:w="3116" w:type="dxa"/>
</w:tcPr>
<!-- commented code here
<w:p>
<w:r>
<w:t>AAA</w:t>
</w:r>
</w:p>
-->
</w:tc>
Anyone able to explain what might be happening here? According to this documentation, it should be working. Could I be looking in the wrong area?
EDIT - I thought I should note that the document opens up fine in Google Docs. It does not open with MS Office 365 (both Desktop and in MS Teams).
I figured out what my issue was after exploring some OOXML validation tools.
What I learned was that my table <w:tbl>
was a child of a paragraph <w:p>
, which is not allowed according to the schema. Removing the parent <w:p>...</w:p>
tags resolved my issue.