I have next HTML file, called input.html, from where I want to extract XML fragments:
<!DOCTYPE html>
<div>Text with ó</div>
I apply this XSL stylesheet, named stylesheet.xsl:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" />
<xsl:template match="div">
<tag attribute="{child::text()}"></tag>
</xsl:template>
</xsl:stylesheet>
Executing xsltproc stylesheet.xsl input.html
, I want to get next result:
<?xml version="1.0"?>
<tag attribute="Text with ó"/>
but instead, I get unwanted hexadecimal entities into the attribute:
<?xml version="1.0"?>
<tag attribute="Text with ó"/>
I wonder how I can avoid the introduction of these unwanted hexadecimal entities, without having to translate every possible entity back as explained at XSL: how do I keep xsltproc from tampering with an escaped HTML string in an attribute value?.
Add an attribute of encoding="UTF-8"
to your xsl:output
instruction.