javascriptxmlxml-entities

Javascript DOMParser and XMLSerialier removes XML entities


I am trying to preserve some XML entities when parsing XML files in javascript. The following code snippet illustrates the problem. Is there a way for me to make round-trip parse and retain the XML entities (  is nbsp; html)? This happens in Chrome FF and IE10.

var aaa='<root><div>&#160;one&#160;two</div></root>'
var doc=new DOMParser().parseFromString(aaa,'application/xml')
new XMLSerializer().serializeToString(doc)
"<root><div> one two</div></root>"

The issue is I am taking some chunks out of html and storing them in xml, and then I want to get the spaces back in XML when I'm done. Edit: As Dan and others have pointed out, the parser replaces it with the ascii code 160, which to my eyes looks like an ordinary space but:

var str1=new XMLSerializer().serializeToString(doc)
str1.charCodeAt(15)
160

So where ever my application is losing the spaces, it is not here.


Solution

  • You can use a ranged RegExp to turn the special chars back into xml representations. as a nice re-usable function:

    function escapeExtended(s){
     return s.replace(/([\x80-\xff])/g, function (a, b) {
       var c = b.charCodeAt();
       return "&#" + b.charCodeAt()+";" 
     });
    }
    
    
    var aaa='<root><div>&#160;one&#160;two</div></root>'
    var doc=new DOMParser().parseFromString(aaa,'application/xml')
    var str= new XMLSerializer().serializeToString(doc);
    alert(escapeExtended(str)); // shows: "<root><div>&#160;one&#160;two</div></root>"
    

    Note that HTML entities (ex quot;) will lose their symbol name, and be converted to XML entities (the &#number; kind). you can't get back the names without a huge conversion table.