htmlxsssanitizationhtml-sanitizing

Why are these tags removed with MGans's HtmlSanitizer?


I am thinking of using the HtmlSanitizer Nuget package by MGans for sanitizing input and output on our application. Given the below input and applying the Sanitize() method, the following is returned:

Input:

this  is my data
<p> here</p>
<script type="text/javascript"/>
<b>and here</b>
alert("something");
done here
<script type="text/javascript">alert("again");</script>
done

Output:

this  is my data
<p> here</p>

done

Why are the tags <b>and here</b>, alert("something"); and text done here removed if the first <script/> tag is a self-closing tag with no arguments?


Solution

  • In HTML4 and HTML5, <script> tags can't be self-closing. The self-closing tags are the void elements in the spec.

    When parsed, then the first <script> tag is treated as the opening tag, and the final </script> tag is its ending tag.

    A browser would treat the slash as malformed input and ignore it, then execute the contents after it as JavaScript code.