docx4jhtml-to-docx

Problem with converting html with MathML to docx with docx4j


When I'm trying to convert any html with MathML to docx with using docx4j library for example:

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8"></meta>
  <title>Personal list export template</title>
  <style>
    body {
      font-family: Arial, sans-serif;
      color: #666;
      padding: 20px;
    }
    .document {
      display: block;
      padding-bottom: 10px;
      margin-bottom: 20px;
    }
    .header, .note {
      align-items: start;
      font-size: 1rem;
    }
    .header strong, .note strong {
      font-size: 1.2em;
      color: #333;
      font-weight: bold;
      margin-bottom: 5px;
    }
    .list-item {
      border: none;
      padding: 0;
    }
    .bullet-point {
      vertical-align: baseline;
      padding-right: 10px;
      width: 15px;
    }
    a {
      color: #0066cc;
      text-decoration: none;
      transition: color 0.3s;
    }
    a:hover {
      color: #0057a3;
    }
    table {
      border-collapse: collapse;
      width: 100%;
    }
    th, td {
      border: 1px solid black;
      padding: 8px;
      text-align: left;
    }
    img {
      width: 400px;
      height: auto;
    }
    p:empty {
      margin: 0;
    }
    li {
      position: relative;
      margin-bottom: 5px;
    }
    ul {
      list-style-type:none;
    }
  </style>
</head>
<body>
<div class="document-details">
  <div>
    <div class="document">
      <div class="header">
        <p class="header-from"><strong>From: </strong><a href=""></a></p>
        <div><div><math id="mml-m6"><mrow><mrow><msubsup><mo>∫</mo><mn>0</mn><msub><mi>t</mi><mi>test</mi></msub></msubsup><msubsup><mi>i</mi><mi>test</mi><mn>2</mn></msubsup></mrow><mi>dt</mi><mo>≥</mo><msup><mi>I</mi><mn>2</mn></msup><mo>·</mo><msub><mi>t</mi><mi>CW</mi></msub></mrow></math></div></div>
      </div>
      
    </div>
  </div>
</div>
</body>
</html>

I'm getting the Exception java.io.IOException: mml2omml.xslZ not found via classloader.

It happens because of this code in library XHTMLImporterImpl.class

  public static Templates getMathXSLT() throws IOException, TransformerConfigurationException {
    if (mathXSLT == null) {
      Source xsltSource = new StreamSource(ResourceUtils.getResourceViaProperty("docx4j-ImportXHTML.mml2omml", "mml2omml.xslZ"));
      mathXSLT = XmlUtils.getTransformerTemplate(xsltSource);
    }

I'm using 'org.docx4j:docx4j-ImportXHTML-core:11.5.0' for converting. It's last version.

Ok I downloaded file mml2omml.xsl and added it to resources in project and renamed it to mml2omml.xslZ because library trying to find this file.

After this I don't have any exception but html with MathML not converting to docx. I got just empty docx file.

I expect that I can convert html with MathML to docx with docx4j library.

Could anybody help?


Solution

  • In my case was problem with html. For converting docx4j wanted to have namespace in every MathML tag. So working MathML looks like:

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="mml-m6">
      <mml:mrow>
        <mml:mrow>
          <mml:msubsup>
            <mml:mo>∫</mml:mo>
            <mml:mn>0</mml:mn>
            <mml:msub>
              <mml:mi>t</mml:mi>
              <mml:mi>test</mml:mi>
            </mml:msub>
          </mml:msubsup>
          <mml:msubsup>
            <mml:mi>i</mml:mi>
            <mml:mi>test</mml:mi>
            <mml:mn>2</mml:mn>
          </mml:msubsup>
        </mml:mrow>
        <mml:mi>dt</mml:mi>
        <mml:mo>≥</mml:mo>
        <mml:msup>
          <mml:mi>I</mml:mi>
          <mml:mn>2</mml:mn>
        </mml:msup>
        <mml:mo>·</mml:mo>
        <mml:msub>
          <mml:mi>t</mml:mi>
          <mml:mi>CW</mml:mi>
        </mml:msub>
      </mml:mrow>
    </mml:math>