I'm using antisamy library to sanitize input to my application against XSS. I have problem with nested tags like a:
<<b>script>alert('xss');<</b>/script>
My sanitize method looks like:
public String clean(String input) {
if (input == null) {
return null;
}
input = StringEscapeUtils.unescapeHtml(input);
try {
Policy policy = Policy.getInstance(getClass().getResourceAsStream("/antisamy-textonly-policy.xml"));
AntiSamy antiSamy = new AntiSamy();
CleanResults cleanResults = antiSamy.scan(input, policy);
String cleaned = cleanResults.getCleanHTML();
return StringEscapeUtils.unescapeHtml(cleaned);
} catch (PolicyException e) {
....
} catch (ScanException e) {
....
}
}
My test against this type of input is failing:
public void doubleTagTest() {
def cleaned = xss.clean("<<b>script>alert('xss');<</b>/script>");
assert cleaned.isEmpty();
}
With:
Assertion failed: assert cleaned.isEmpty() | | | false alert('xss');
at org.codehaus.groovy.runtime.InvokerHelper.assertFailed(InvokerHelper.java:386)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.assertFailed(ScriptBytecodeAdapter.java:658)
Do you have any idea how to do handle it without recursive call on xss.clean()
?
Antisamy is producing the correct result - the badly formed tag(s) are removed leaving plain text alert('xss');
.
Consider the following
<b<i>>Hello World!</b</i>>
A bold and italic tag have somehow become muddled - antisamy correctly strips the broken tags leaving the text Hello World!
which is correct. That there is a plain text that looks like javascript remaining in your original test is of no concern - the harmful <script>
tag has been removed.