phpxsltxml-entities

Using text() to match custom entity names in XSLT


I am using <xsl:template match="m:*/text()"> to match text in my XML Document, which is working fine for plain text and known entities, i.e. it works fine for entities like &amp; or unicode entities like &#x003C0;.

However what's not working is matching custom entity names. For example I have an entity &pi; in my XML Document, that should be matched using text(). For some reason it does not treat that entity as text, meaning nothing is being matched.

Please note that I did declare the entity name in the Doctype declaration of the XML Document, and of the XSLT Document as well:

<!DOCTYPE xsl:stylesheet [<!ENTITY pi "&#x003C0;">]>

Is text() the right approach to matching custom entity names, or do I need to use another function? (Maybe I also did something wrong declaring the entity name?)

Thanks

Edit

XML

<!DOCTYPE mathml [<!ENTITY pi "&#x003C0;">]>
<math xmlns="http://www.w3.org/1998/Math/MathML" display="inline">    
    <mi>&pi;</mi>
    <mi>test</mi>
    <mi>&#x003C0;</mi>
</math>

XSLT

<?xml version='1.0' encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [<!ENTITY pi "&#x003C0;">]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:m="http://www.w3.org/1998/Math/MathML"
                version='1.0'>

    <xsl:template match="m:*/text()">
        <xsl:call-template name="replaceEntities">
            <xsl:with-param name="content" select="normalize-space()"/>
        </xsl:call-template>
    </xsl:template>

    <xsl:template name="replaceEntities">
        <xsl:param name="content"/>
        <xsl:value-of select="$content"/>
    </xsl:template>
</xsl:stylesheet>

The variable $content should get printed three times, however only test and &#x003C0; is printed.

Processing using PHP

$xslDoc = new DOMDocument();
$xslDoc->load("doc.xsl");
$xslProcessor = new \XSLTProcessor();
$xslProcessor->importStylesheet($xslDoc);
$mathMLDoc = new DOMDocument();
$mathMLDoc->loadXML('<!DOCTYPE mathml [<!ENTITY pi "&#x003C0;">]><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><mi>&pi;</mi><mi>test</mi><mi>&#x003C0;</mi></math>');
echo $xslProcessor->transformToXML($mathMLDoc);

Solution

  • As far as I can see, the problem is that the DTD is not visible to the XSLT stylesheet. Use the following to substitute entities with their textual value before transforming the document:

    $mathMLDoc->substituteEntities = true;
    

    as in

    $xslDoc = new DOMDocument();
    $xslDoc->load("tree.xsl");
    $xslProcessor = new \XSLTProcessor();
    $xslProcessor->importStylesheet($xslDoc);
    $mathMLDoc = new DOMDocument();
    $mathMLDoc->substituteEntities = true;
    $mathMLDoc->loadXML('<!DOCTYPE math [<!ENTITY pi "&#x003C0;">]><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><mi>&pi;</mi><mi>test</mi><mi>&#x003C0;</mi></math>');
    echo $xslProcessor->transformToXML($mathMLDoc);
    

    which will produce

    <?xml version="1.0"?>
    πtestπ
    

    Some background: http://php.net/manual/en/xsltprocessor.transformtoxml.php#99932 and http://hublog.hubmed.org/archives/001854.html.