javascripthtmlescapingxml-rpc

Unescape HTML entities in JavaScript?


I have some JavaScript code that communicates with an XML-RPC backend. The XML-RPC returns strings of the form:

<img src='myimage.jpg'>

However, when I use JavaScript to insert the strings into HTML, they render literally. I don't see an image, I see the string:

<img src='myimage.jpg'>

I guess that the HTML is being escaped over the XML-RPC channel.

How can I unescape the string in JavaScript? I tried the techniques on this page, unsuccessfully: http://paulschreiber.com/blog/2008/09/20/javascript-how-to-unescape-html-entities/

What are other ways to diagnose the issue?


Solution

  • EDIT: You should use the DOMParser API as Wladimir suggests, I edited my previous answer since the function posted introduced a security vulnerability.

    The following snippet is the old answer's code with a small modification: using a textarea instead of a div reduces the XSS vulnerability, but it is still problematic in IE9 and Firefox.

    function htmlDecode(input){
      var e = document.createElement('textarea');
      e.innerHTML = input;
      // handle case of empty input
      return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
    }
    
    htmlDecode("&lt;img src='myimage.jpg'&gt;"); 
    // returns "<img src='myimage.jpg'>"
    

    Basically I create a DOM element programmatically, assign the encoded HTML to its innerHTML and retrieve the nodeValue from the text node created on the innerHTML insertion. Since it just creates an element but never adds it, no site HTML is modified.

    It will work cross-browser (including older browsers) and accept all the HTML Character Entities.

    EDIT: The old version of this code did not work on IE with blank inputs, as evidenced here on jsFiddle (view in IE). The version above works with all inputs.

    UPDATE: appears this doesn't work with large string, and it also introduces a security vulnerability, see comments.