Why is %e9 or %fd invalid string to decode using the decodeURIComponent from Javascript?
These characters appear in the middle of a string and I can't understand where the problem is. They are valid hexadecimal characters.
Full string (this is part of a string sent by client app to server and that was being blocked by modsec):
%61%e9%3d%36%7f%00%00%01%00%00%43%fd%a1%5a%00%00%00%43
Sample to decode:
decodeURIComponent("%61%e9%3d%36%7f%00%00%01%00%00%43%fd%a1%5a%00%00%00%43")
Error:
VM222:1 Uncaught URIError: URI malformed
at decodeURIComponent (<anonymous>)
at <anonymous>:1:1
I am using these two functions to encode base64 and decode from base64 (from here:Mozilla):
function c64(t) {
return btoa(encodeURIComponent(t).replace(/%([0-9A-F]{2})/g,
(match, p1) => {
return String.fromCharCode('0x' + p1);
}));
}
function d64(t) {
return decodeURIComponent(atob(t).split('').map(function (c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
The original string is in base64:
d64("Yek9Nn8AAAEAAEP9oVoAAABDYek9Nn8AAAEAAEP9oVoAAABD")
returns:
...js:1 Uncaught URIError: URI malformed
at decodeURIComponent (<anonymous>)
This is because the unicode representation of that character in hexadecimal encoding is not "%e9"
or "%E9"
.
Start by typing in console:
"\u00e9"
or "\u00E9"
which is % replaced by "\u00"
in your example. You will get:
'é'
You can verify this by running:
escape('é') //"%E9".
Now run
encodeURIComponent('é')
and you will get "%C3%A9"
not "%E9"
. This is because encodeURIComponent
returns hex dump of bytes. If the character is 2 bytes you get %xx%yy
, if 3 bytes you get %xx%yy%zz
.
Try this with "€"
. First do:
escape("€")
, you will get '%u20AC'
or same as "\u20AC"
.
To get the hex dump of its byte code run:
encodeURIComponent("€")
and you will get '%E2%82%AC'
.
This example from Wikipedia 'UTF-8' article explains in detail how '%E2%82%AC'
is calculated. It is the hex dump of 11100010 10000010 10101100
.