I am trying to preserve some XML entities when parsing XML files in javascript. The following code snippet illustrates the problem. Is there a way for me to make round-trip parse and retain the XML entities ( is nbsp; html)? This happens in Chrome FF and IE10.
var aaa='<root><div> one two</div></root>'
var doc=new DOMParser().parseFromString(aaa,'application/xml')
new XMLSerializer().serializeToString(doc)
"<root><div> one two</div></root>"
The issue is I am taking some chunks out of html and storing them in xml, and then I want to get the spaces back in XML when I'm done. Edit: As Dan and others have pointed out, the parser replaces it with the ascii code 160, which to my eyes looks like an ordinary space but:
var str1=new XMLSerializer().serializeToString(doc)
str1.charCodeAt(15)
160
So where ever my application is losing the spaces, it is not here.
You can use a ranged RegExp to turn the special chars back into xml representations. as a nice re-usable function:
function escapeExtended(s){
return s.replace(/([\x80-\xff])/g, function (a, b) {
var c = b.charCodeAt();
return "&#" + b.charCodeAt()+";"
});
}
var aaa='<root><div> one two</div></root>'
var doc=new DOMParser().parseFromString(aaa,'application/xml')
var str= new XMLSerializer().serializeToString(doc);
alert(escapeExtended(str)); // shows: "<root><div> one two</div></root>"
Note that HTML entities (ex quot;) will lose their symbol name, and be converted to XML entities (the &#number; kind). you can't get back the names without a huge conversion table.