regexxsltxslt-3.0

XSLT3.0 Regex replace in all the xml text content


I have a simple XSLT3.0 that should only replace characters that are not ascii (and the € sign) with a .

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
    <xsl:output method="xml" indent="no"/>

    <!-- Template to match all nodes except text nodes -->
    <xsl:template match="node()[not(self::text())]">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <!-- Template to match text nodes and apply regex -->
    <xsl:template match="text()">
        <xsl:value-of select="replace(., '[^\\x00-\\xFF€]', '.')"/>
    </xsl:template>
</xsl:stylesheet>

But for some reason it matches everything and my test case: <fieldName>generic currency ¤ euro € other characters देवनागरी German ä, ö, ü, and ß russian a, y, o, ы, э, я, ю, ё, и</fieldName>

it becames: <fieldName>........................€...........................G......................................................</fieldName>

the expected output should be: <fieldName>generic currency ¤ euro € other characters ........ German ä, ö, ü, and ß russian a, y, o, ., ., ., ., ., .</fieldName>


Solution

  • Several problems here:

    <xsl:value-of select="replace(., '[^\\x00-\\xFF€]', '.')"/>
    

    Firstly, there is no need to double your backslashes in XPath regular expressions. Because you have doubled it, \\ is just an escaped backslash, and the other characters such as x and 0 and F just represent themselves.

    Secondly, the construct \xHH is not recognized in the XPath regex dialect. Write &#9;-&#255; (hex 09, TAB, is the first codepoint allowed in XML 1.0)

    Thirdly, you mentioned ASCII, but ASCII stops at 127, not at 255.