xmlxsltxml-entities

XSL entity definitions ignored - why?


I'm writing an XSL file that transforms XML to markdown text. I want to transform HTML entities to themselves, e.g. "®" in a piece of text in the XML will yield "®" in the output file. I do not want to translate "®" to its hexadecimal equivalent, which is liable to upset processes downstream.

It seems to me that the following declaration should do what I need, when placed immediately after the <?xml...?> tag:

<!DOCTYPE stylesheet [
  <!ENTITY reg    "&amp;reg;" >
  <!ENTITY trade  "&amp;trade;" >
]>

When I process an XML file, though, the XSL processor (Saxonica HE) issues a message like this one at each use of an entity:

Error on line 6 column 12 of test.xml:
  SXXP0003: Error reported by XML parser: The entity "reg" was referenced, but not declared.

What have I done wrong?


Solution

  • It's complaining that the file test.xml isn't well-formed. Nothing you add to the stylesheet is going to change that. If test.xml includes entity references, then it must have a DTD that defines those entities.

    What you're trying to achieve is difficult because XSLT works on the XDM data model, which has no way of representing entity references in unexpanded form. The XML parser will always expand entity references before the XSLT transformer kicks in.

    One workaround is the Lexev tool from Andrew Welch, which preprocesses the input XML to convert entity references to something else (Processing instructions, IIRC), and then converts them back to entity references during serialization.

    Another approach (probably better) is to replace all occurrences of ® (whether they originated as &reg; or not) by &reg; during serialization, which you can achieve using XSLT 2.0 character maps.