There are many Stack Overflow questions (e.g. Whitelisting, preventing XSS with WMD control in C# and WMD Markdown and server-side) about how to do server-side scrubbing of Markdown produced by the WMD editor to ensure the HTML generated doesn't contain malicious script, like this:
<img onload="alert('haha');"
src="http://www.google.com/intl/en_ALL/images/srpr/logo1w.png" />
But I didn't find a good way to plug the hole on the client side too. Client validation isn't a replacement for scrubbing validation on the server of course, since anyone can pretend to be a client and POST you nasty Markdown. And if you're scrubbing the HTML on the server, an attacker can't save the bad HTML so no one else will be able to see it later and have their cookies stolen or sessions hijacked by the bad script. So there's a valid case to be made that it may not be worth enforcing no-script rules in the WMD preview pane too.
But imagine an attacker found a way to get malicious Markdown onto the server (e.g. a compromised feed from another site, or content added before an XSS bug was fixed). Your server-side whitelist applied when translating markdown to HTML would normally prevent that bad Markdown from being shown to users. But if the attacker could get someone to edit the page (e.g. by posting another entry saying the malicious entry had a broken link and asking someone to fix it), then anyone who edits the page gets their cookies hijacked. This is admittedly a corner case, but it still may be worth defending against.
Also, it's probably a bad idea to allow the client preview window to allow different HTML than your server will allow.
The Stack Overflow team has plugged this hole by making changes to WMD. How did they do it?
[NOTE: I already figured this out, but it required some tricky JavaScript debugging, so I'm answering my own question here to help others who may want to do ths same thing].
One possible fix is in wmd.js, in the pushPreviewHtml()
method. Here's the original code from the Stack Overflow version of WMD on GitHub:
if (wmd.panels.preview) {
wmd.panels.preview.innerHTML = text;
}
You can replace it with some scrubbing code. Here's an adaptation of the code that Stack Overflow uses in response to this post, which restricts to a whitelist of tags, and for IMG and A elements, restricts to a whitelist of attributes (and in a specific order too!). See the Meta Stack Overflow post What HTML tags are allowed on Stack Overflow, Server Fault, and Super User? for more info on the whitelist.
Note: this code can certainly be improved, e.g. to allow whitelisted attributes in any order. It also disallows mailto: URLs which is probably a good thing on Internet sites but on your own intranet site it may not be the best approach.
if (wmd.panels.preview) {
// Original WMD code allowed JavaScript injection, like this:
// <img src="http://www.google.com/intl/en_ALL/images/srpr/logo1w.png" onload="alert('haha');"/>
// Now, we first ensure elements (and attributes of IMG and A elements) are in a whitelist,
// and if not in whitelist, replace with blanks in preview to prevent XSS attacks
// when editing malicious Markdown.
var okTags = /^(<\/?(b|blockquote|code|del|dd|dl|dt|em|h1|h2|h3|i|kbd|li|ol|p|pre|s|sup|sub|strong|strike|ul)>|<(br|hr)\s?\/?>)$/i;
var okLinks = /^(<a\shref="(\#\d+|(https?|ftp):\/\/[-A-Za-z0-9+&@#\/%?=~_|!:,.;\(\)]+)"(\stitle="[^"<>]+")?\s?>|<\/a>)$/i;
var okImg = /^(<img\ssrc="https?:(\/\/[-A-Za-z0-9+&@#\/%?=~_|!:,.;\(\)]+)"(\swidth="\d{1,3}")?(\sheight="\d{1,3}")?(\salt="[^"<>]*")?(\stitle="[^"<>]*")?\s?\/?>)$/i;
text = text.replace(/<[^<>]*>?/gi, function (tag) {
return (tag.match(okTags) || tag.match(okLinks) || tag.match(okImg)) ? tag : ""
})
wmd.panels.preview.innerHTML = text; // Original code
}
Also note that this fix is not in the Stack Overflow version of WMD on GitHub-- clearly the change was made later and not checked back into GitHub.
UPDATE: in order to avoid breaking the feature where hyperlinks are auto-created when you type in a URL, you also will need to make changes to showdown.js, like below:
Original code:
var _DoAutoLinks = function(text) {
text = text.replace(/<((https?|ftp|dict):[^'">\s]+)>/gi,"<a href=\"$1\">$1</a>");
// Email addresses: <address@domain.foo>
/*
text = text.replace(/
<
(?:mailto:)?
(
[-.\w]+
\@
[-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+
)
>
/gi, _DoAutoLinks_callback());
*/
text = text.replace(/<(?:mailto:)?([-.\w]+\@[-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+)>/gi,
function(wholeMatch,m1) {
return _EncodeEmailAddress( _UnescapeSpecialChars(m1) );
}
);
return text;
}
Fixed code:
var _DoAutoLinks = function(text) {
// use simplified format for links, to enable whitelisting link attributes
text = text.replace(/(^|\s)(https?|ftp)(:\/\/[-A-Z0-9+&@#\/%?=~_|\[\]\(\)!:,\.;]*[-A-Z0-9+&@#\/%=~_|\[\]])($|\W)/gi, "$1<$2$3>$4");
text = text.replace(/<((https?|ftp):[^'">\s]+)>/gi, '<a href="$1">$1</a>');
return text;
}