I have written a micro-templating utility that uses innerHTML to inject html fragments in a Web page, based on user input (either plain text strings or html strings).
My main concern is the risk of malicious script injection. The script could be injected via a script tag, or in an inline event (img onload, div onmouseover for example).
Is there a way to sanitize the html string to prevent such injections? Also, are there other script injection methods I should be aware of?
If you want to be safe, you'll sanitize your templates both on the client and the server. Don't write your own anti-XSS library as malicious users are bound to know an exploit that you haven't accounted for; there are just too many nuances and unless you're an XSS expert, you're bound to miss one.
On the client side, Google Caja has a pretty nice HTML sanitization utility that will perform robust sanitization on HTML strings, cleaning up malicious attributes or other areas where nasty users can do nasty things, like XSS attacks via injecting script
tags. They also scrub attributes and all kinds of other XSS injection points (object
and applet
tags, for instance), so you can feel fairly safe.
While you should sanitize on the server to prevent malicious users from simply disabling javascript or overwriting Caja's sanitizer, you can use Caja to sanitize both input and output to try to catch as much as you can.