How to fix custom font-size added as inline HTML style in docx file generated using DOCX4J when embedding HTML content as altchunk?

Using docx4j java libraries, when trying to generate a docx file having a HTML string embedded in docx file as altchunk the inline font-size formatting does not work as expected. When font-size is set as 24pt, docx file shows it as 14 only.

When changing font-size to either 23pt or 24pt, it works as expected. Same issue also does not happen for any others tag like p or other Heading#. In example below both Heading1 and Heading2 are taken with custom font-size as inline style but it works only for Heading2.

Example HTML String:

"<html><body><h1 style="font-weight: normal; line-height: 1.1; margin-top: 0.2em; margin-bottom: 0.2em; background-color: transparent; color: #404040; font-family: Calibri; font-size: 24pt;">H1</h1><h2 style="font-weight: normal; line-height: 1.1; margin-top: 0.2em; margin-bottom: 0.2em; background-color: transparent; color: #404040; font-family: Calibri; font-size: 26pt;"> H2 </h2></body></html>"

As seen in MS word: Heading 1 styling as seen in MS Word

Code:

String html = "<html><body><h1 style=\"font-weight: normal; line-height: 1.1; margin-top: 0.2em; margin-bottom: 0.2em; background-color: transparent; color: #404040; font-family: Calibri; font-size: 24pt;\">H1</h1><h2 style=\"font-weight: normal; line-height: 1.1; margin-top: 0.2em; margin-bottom: 0.2em; background-color: transparent; color: #404040; font-family: Calibri; font-size: 26pt;\"> H2 </h2></body></html>";

WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();

byte[] bytes = html.getBytes(StandardCharsets.UTF_8);
ByteArrayInputStream baos = new ByteArrayInputStream(bytes);

CTAltChunk ac = new ObjectFactory().createCTAltChunk();
ac.setId("htmlChunk");

wordMLPackage.getMainDocumentPart().addAltChunk(AltChunkType.Html, bais);

ByteArrayOutputStream baos = new ByteArrayOutputStream();

wordMLPackage.save(baos);
            
byte[] docxFile = baos.toByteArray();

Solution

I'm not a pro of docx4j library and can't tell about the inner logic, whether some default styling is taking precedence over yours or if that's just a bug.

Looking at the docx4j documentation they mention an external library for handling Xhtml import (docx4j-ImportXHTML).

Adding the library as dependency and making the following adaptation to your code seems to be generating the expected result, based on docx4j-ImportXHTML sample.

    var html = "<html><body><h1 style=\"font-weight: normal; line-height: 1.1; margin-top: 0.2em; margin-bottom: 0.2em; background-color: transparent; color: #404040; font-family: Calibri; font-size: 24pt;\">H1</h1><h2 style=\"font-weight: normal; line-height: 1.1; margin-top: 0.2em; margin-bottom: 0.2em; background-color: transparent; color: #404040; font-family: Calibri; font-size: 26pt;\"> H2 </h2></body></html>";

    var wordMLPackage = WordprocessingMLPackage.createPackage();
    var mdp = wordMLPackage.getMainDocumentPart();

    mdp.addAltChunk(AltChunkType.Xhtml, new ByteArrayInputStream(html.getBytes(StandardCharsets.UTF_8)));

    mdp.convertAltChunks();

    wordMLPackage.save(new FileOutputStream("myFile.docx"));