In our MVC3 project, the HttpUtility.HtmlEncode method seems to be escaping too much characters. Our web pages are served as UTF-8 pages, but still the method escapes characters like ü or the Yen character ¥, even though tese characters are part of the UTF-8 set.
So when my MVC view contains the following piece of code:
Then I would expect the Encoder to escape the html tags, but not the ümlaut
But instead it is giving me the following piece of HTML:
For completeness, I also mention that the responseEncoding in the web.config is explictely set to utf-8, so I would expect the HtmlEncode method to respect this setting.
<globalization requestEncoding="utf-8" responseEncoding="utf-8" />
As Aristos suggested we could use the AntiXSS library from Microsoft. It contains a UnicodeCharacterEncoder that behaves as you would expect.
But because we
We chose to implement our own very basic HTML encoder. You can find the code below. Please feel free to adapt/comment/improve if you see any issues.
public static class HtmlEncoder
private static IDictionary<char, string> toEscape = new Dictionary<char, string>()
{ '<', "lt" },
{ '>', "gt" },
{ '"', "quot" },
{ '&', "amp" },
{ '\'', "#39" },
/// <summary>
/// HTML-Encodes the provided value
/// </summary>
/// <param name="value">object to encode</param>
/// <returns>An HTML-encoded string representing the provided value.</returns>
public static string Encode(object value)
if (value == null)
return string.Empty;
// If value is bare HTML, we expect it to be encoded already
if (value is IHtmlString)
return value.ToString();
string toEncode = value.ToString();
// Init capacity to length of string to encode
var builder = new StringBuilder(toEncode.Length);
foreach (char c in toEncode)
string result;
bool success = toEscape.TryGetValue(c, out result);
string character = success
? "&" + result + ";"
: c.ToString();
return builder.ToString();