I have a code that replaces placeholders like ${NAME} to the plain text. I use docx4j and docx4j-search-and-replace-util for replacing placeholders.
It works fine, but now in the one of fields "APP_ADDITIONAL_INFO" I need to replace placeholder with a simple formatted HTML from Quill editor like:
<p><strong>Header</strong></p><p><strong>Text string</strong></p>
And a result .docx document contains this html instead of formatted text for this field.
I studied this issue and realized that for this purpose it is necessary to use docx4j-ImportXHTML and JTidy. With JTidy I can convert my HTML to XHTML, and then ImportXHTML converts XHTML into WordML format.
But now in the result .docx I see the full WordML markup instead of formatted text, starting from
<w:document xmlns:dsp="http://schemas.microsoft.com/office/drawing/2008/diagram"
and so on, like
<w:r><w:rPr><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"/><w:b/><w:i w:val="false"/><w:color w:val="000000"/><w:sz w:val="22"/></w:rPr><w:t>Formatted text string</w:t></w:r>
So, where I am wrong?
My code is:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
BufferedReader br = new BufferedReader(new StringReader(doc.getDescription()));
StringWriter sw = new StringWriter();
Tidy t = new Tidy();
t.setDropEmptyParas(true);
t.setShowWarnings(false); //to hide errors
t.setQuiet(true); //to hide warning
t.setUpperCaseAttrs(false);
t.setXmlOut(true);
t.setUpperCaseTags(false);
t.setInputEncoding("UTF-8");
t.setOutputEncoding("UTF-8");
t.setXmlOut(true);
t.parse(br,sw);
StringBuffer sb = sw.getBuffer();
String strClean = sb.toString();
br.close();
sw.close();
wordMLPackage.getMainDocumentPart().getContent().addAll(XHTMLImporter.convert( strClean, null) );
// the variable that should contain WordML markup
String description = XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(), true, true);
// map with placeholders and replacing data
Map<String, String> replaceMap = new HashMap<String, String>() {{
put("${APP_EMPLOYEE}", doc.getEmployeeName());
put("${APP_JOB_TITLE}", doc.getJobtitle());
put("${APP_ADDITIONAL_INFO}", description);
}};
byte[] cos = gt.generateDocXDocument(filePath, replaceMap, masterId);
return ResponseEntity.ok()
.header(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=\"test.docx\"").body(cos);
In generateDocXDocument method:
generateDocXDocument(String filePath, Map <String, String> replaceMap){
byte[] decryptedBytesOfFile = storageService.loadFile(filePath);
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new ByteArrayInputStream(decryptedBytesOfFile));
Docx4JSRUtil.searchAndReplace(wordMLPackage, replaceMap);
OutputStream outputStream = new ByteArrayOutputStream();
Save saver = new Save(wordMLPackage);
saver.save(outputStream);
return ((ByteArrayOutputStream) outputStream).toByteArray();
}
I even just tried to not to use docx4j-ImportXHTML and JTidy, and simply change placeholder with WordML markup like:
put("${APP_ADDITIONAL_INFO}", "<w:r><w:rPr><w:rFonts w:ascii=\"Times New Roman\" w:hAnsi=\"Times New Roman\"/><w:b/><w:i w:val=\"false\"/><w:color w:val=\"000000\"/><w:sz w:val=\"22\"/></w:rPr><w:t>Formatted text</w:t></w:r>");
but the result is the same - resulted .docx file contains this markup.
As you've found, that's not going to work. You can't replace text with markup that way.
A better approach is to use content control databinding; see Replace a content control with an HTML value while generate document using docx4J
https://github.com/plutext/docx4j/blob/VERSION_11_5_0/docx4j-core/src/main/java/org/docx4j/model/datastorage/migration/FromVariableReplacement.java can be used to convert placeholders/variables to content controls.