javascripturlencodeurl-encodingencodeuricomponent

JS encodeURIComponent result different from the one created by FORM


I thought values entered in forms are properly encoded by browsers.

But this simple test file "test_get_vs_encodeuri.html" shows it's not true:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><head>
   <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
   <title></title>
</head><body>

<form id="test" action="test_get_vs_encodeuri.html" method="GET" onsubmit="alert(encodeURIComponent(this.one.value));">
   <input name="one" type="text" value="Euro-€">
   <input type="submit" value="SUBMIT">
</form>

</body></html>

When hitting submit button:

encodeURICompenent encodes input value into "Euro-%E2%82%AC"

while browser into the GET query writes only a simple "Euro-%80"

  1. Could someone explain?

  2. How do i encode everything in the same way of the borwser's FORM (windows-1252) using Javascript??? (escape function does not work, encodeURIComponent does not work either)?

Or is encodeURIComponent doing unnecessary conversions?


Solution

  • This is a character encoding issue. Your document is using the charset Windows-1252 where the is at position 128 that is encoded with Windows-1252 as 0x80. But encodeURICompenent is expecting the input to be UTF-8, thus using Unicode’s charset where the is at position 8364 (PDF) that is encoded with UTF-8 0xE282AC.

    A solution would be to use UTF-8 for your document as well. Or you write a mapping to convert UTF-8 encoded strings to Windows-1252.