How can I strip the HTML from a string in JavaScript?
Using the browser's parser is the probably the best bet in current browsers. The following will work, with the following caveats:
<div>
element. HTML contained within <body>
or <html>
or <head>
tags is not valid within a <div>
and may therefore not be parsed correctly.textContent
(the DOM standard property) and innerText
(non-standard) properties are not identical. For example, textContent
will include text within a <script>
element while innerText
will not (in most browsers). This only affects IE <=8, which is the only major browser not to support textContent
.<script>
elements.null
<img onerror='alert(\"could run arbitrary JS here\")' src=bogus>
Code:
var html = "<p>Some HTML</p>";
var div = document.createElement("div");
div.innerHTML = html;
var text = div.textContent || div.innerText || "";