Currently, I'm trying to use the MSXML loadXML method in ASP to load XML string which may contain Unicode Chinese characters like
𠮢 (U+20BA2) 4bytes
and the xml string looks like
<City>City</City><Name>𠮢</Name>
So, in my code, I could see the xml string comes in right, but the loadXML returns an an error message like
Invalid unicode characters, & #55362;�
Can someone please tell me what I can do to resolve this issue?
Thanks,
Edited
The code looks like this
Set objDoc = CreateObject("MSXML2.DOMDocument")
objDoc.async = false
objDoc.setProperty "SelectionLanguage", "XPath"
objDoc.validateOnParse = false
objDoc.loadXML(strXml)
I suggest posting the exact code, XML source and error message you are getting. I cannot reproduce an error by parsing <element>𠮢</element>
in MSXML 4.0 SP3; this works fine.
I certainly do get a parseError with reason "Invalid unicode character" by trying to parse <element>��</element>
, because that's not well-formed XML. If you do have this in your markup then you need to fix the serialiser that produced it because neither MSXML nor any standards-compliant XML parser will load it.
If 𠮢
is turned into a character reference it must be 𠮢
(or 𠮢
). Code units 55362 and 57250 are 'surrogates', reserved for encoding astral plane characters in UTF-16. They can't be included in an XML document.