xmlparsingrelaxng

Relax NG parser error


I am tring to validate my .xml against .rng but I keep getting this error

 parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xEA 0x63 0x68 0xE9
            <name>Ev▒ch▒ of Seeet Di▒</name>   //here the original word is Evéchç of seeet diè
                    ^
myfile.xml:33: parser error :      Entity 'nbsp' not defined
            <name>SCIEF&nbsp; Toto</name>

in my rng file

<?xml version="1.0" encoding="UTF-8"?>

Solution

  • The byte sequence 0xEA 0x63 0x68 0xE9 is "êché" in ISO-8859-1 (and other charsets), so it seems the first word in the part of the source cited there is actually "Evêché"? (not "Evéchç"…)

    In UTF-8 the bytes for êché would be 0xC3 0xAA 0x63 0x68 0xC3 0xA9.

    So it seems the source isn’t actually encoded in UTF-8 but instead in ISO-8859-1 or something?

    If so the XML declaration must be changed to <?xml version="1.0" encoding="ISO-8859-1"?> or the source needs to converted to UTF-8 (e.g., using iconv).

    As far as the error about &nbsp;, that’s because it’s an HTML character reference and not defined for arbitrary XML documents. Just replace it with &#160; or &#xA0;and that error will go away.