jsoupxssowasphtml-sanitizing

JSoup vs OWSAP AntiSamy for JSON Sanitization


I am looking for a Library to perform JSON Santization and came across JSoup and OWSAP AntiSamy. Looks like AntiSamy does only HTML Sanitization and there is a separate project for JSON Santization. Also JSoup doesn't seem to be mentioning about JSON Sanitization.

Does JSoup and OWSAP AntiSamy perform JSON Sanitization ?


Solution

  • OWASP has a JSON sanitizer project, separate from AntiSamy, that converts JSON-like content to syntactically correct and embeddable JSON.

    The output is well-formed JSON as defined by RFC 4627. The output satisfies three additional properties:

    • The output will not contain the substring (case-insensitively) "</script" so can be embedded inside an HTML script element without further encoding.
    • The output will not contain the substring "]]>" so can be embedded inside an XML CDATA section without further encoding.
    • The output is a valid Javascript expression, so can be parsed by Javascript's eval builtin (after being wrapped in parentheses) or by JSON.parse. Specifically, the output will not contain any string literals with embedded JS newlines (U+2028 Paragraph separator or U+2029 Line separator).
    • The output contains only valid Unicode scalar values (no isolated UTF-16 surrogates) that are allowed in XML unescaped.