javascriptnode.jsxsltsaxon-js

How to transform HTML string using XSLT in Node.js


I have a string from in a Node.js app that I need to transform using XSLT on the server side. The main "transformations" I need to do are removing specific HTML tags and I can't use regex due to security/performance issues. I will also be using the result of the transformation to then make POST requests to an API.

A simple example may look something like:

"<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Aliquam porttitor gravida velit, et facilisis est viverra a. Suspendisse potenti.</p>\n<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Suspendisse potenti.</p>"

And I need to transform it to the following (basically just remove <p> tags in this case):

"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Aliquam porttitor gravida velit, et facilisis est viverra a. Suspendisse potenti.\nLorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Suspendisse potenti."

Here are the main questions I have:


Solution

  • Well, I took the bait, here is how you can do that with SaxonJS:

    const SaxonJS = require("saxon-js")
    
    var input = "<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Aliquam porttitor gravida velit, et facilisis est viverra a. Suspendisse potenti.</p>\n<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Suspendisse potenti.</p>";
    
    const xslt = `<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
      <xsl:output method="text"/>
    </xsl:stylesheet>`;
    
    var result = SaxonJS.XPath.evaluate(`transform(
      map {
        'source-node' : parse-xml-fragment($xml),
        'stylesheet-text' : $xslt,
        'delivery-format' : 'serialized'
        }
    )?output`,
    [],
    { params : {
        xml : input,
        xslt : xslt
      }
    });
       
     console.log(result);
    

    Using output method text will remove all elements, if you don't want that use the default and add <xsl:mode on-no-match="shallow-skip"/> and add templates for those elements you want to preserve e.g. <xsl:template match="h1"><xsl:copy><xsl:apply-templates/></xsl:copy></xsl:template> or approach it the other way around and use <xsl:mode on-no-match="shallow-copy"/> and block what you don't want with matching templates doing e.g. <xsl:template match="p"><xsl:apply-templates/></xsl:template>.

    And in the end, once your stylesheet works and is finished, you should "compile" it with e.g. xslt3 -nogo -xsl:sheet.xsl -export:sheet.sef.json to SEF/JSON and then use the direct transformation API from JavaScript e.g. SaxonJS.transform.