I am tring to validate my .xml
against .rng
but I keep getting this error
parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xEA 0x63 0x68 0xE9
<name>Ev▒ch▒ of Seeet Di▒</name> //here the original word is Evéchç of seeet diè
^
myfile.xml:33: parser error : Entity 'nbsp' not defined
<name>SCIEF Toto</name>
in my rng file
<?xml version="1.0" encoding="UTF-8"?>
The byte sequence 0xEA 0x63 0x68 0xE9
is "êché
" in ISO-8859-1 (and other charsets), so it seems the first word in the part of the source cited there is actually "Evêché
"? (not "Evéchç
"…)
In UTF-8 the bytes for êché
would be 0xC3 0xAA 0x63 0x68 0xC3 0xA9
.
So it seems the source isn’t actually encoded in UTF-8 but instead in ISO-8859-1 or something?
If so the XML declaration must be changed to <?xml version="1.0" encoding="ISO-8859-1"?>
or the source needs to converted to UTF-8 (e.g., using iconv
).
As far as the error about
, that’s because it’s an HTML character reference and not defined for arbitrary XML documents. Just replace it with  
or  
and that error will go away.