There are some user defined entites in the xml data. In order to unescape those entities, we are using below code:-
<xsl:stylesheet version='3.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:output method="xml" omit-xml-declaration="no" use-character-maps="mdash" />
<xsl:character-map name="mdash">
<xsl:output-character character="—" string="&mdash;"/>
<xsl:output-character character="&" string="&amp;" />
<xsl:output-character character=""" string="&quot;" />
<xsl:output-character character="'" string="&apos;" />
<xsl:output-character character="§" string="&sect;"/>
<xsl:output-character character="$" string="&dollar;" />
<xsl:output-character character="/" string="&sol;" />
<xsl:output-character character="-" string="&hyphen;" />
</xsl:character-map>
<!--=================================================================-->
<xsl:template match="@* | node()">
<!--=================================================================-->
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
But there is a special case where §
is appearing twice in data, for example:-
Ex- The number §§
1234
The above should example should be converted to a special userdefined entity i.e.
Output- The number &multisect;
1234
The §§
should be converted to &multisect;
If you want to use a character map, you would first need to process text nodes where you expect the two sect characters to be present and replace them with a single character you don't expect to be used elsewhere; that character could then be converted by the map to the string &multisect;
e.g. the stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="#all"
expand-text="yes"
version="3.0">
<xsl:param name="multisect-sub" static="yes" as="xs:string" select="'«'"/>
<xsl:character-map name="sub">
<xsl:output-character _character="{$multisect-sub}" string="&multisect;"/>
</xsl:character-map>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output method="xml" indent="yes" use-character-maps="sub"/>
<xsl:template match="text()">
<xsl:apply-templates mode="analyze" select="analyze-string(., '§§')"/>
</xsl:template>
<xsl:template mode="analyze" match="fn:match">
<xsl:text>{$multisect-sub}</xsl:text>
</xsl:template>
</xsl:stylesheet>
transforms the input
<!DOCTYPE text [
<!ENTITY sect "§">
]>
<text>§§ 1234</text>
into the output
<?xml version="1.0" encoding="UTF-8"?>
<text>&multisect; 1234</text>
Note that I used '«'
primarily as an example, you might want to need to use a private char or some other character you are sure doesn't occur in your input/output data.
If you want the result to be well-formed you would also need to add a doctype to the output with e.g. xsl:output doctype-system="some.dtd"
where you ensure that some.dtd
declares e.g. <!ENTITY multisect "§§">