javascriptdecodeuricomponent

Why is %fd an invalid string for url decoding?


Why is %e9 or %fd invalid string to decode using the decodeURIComponent from Javascript?

These characters appear in the middle of a string and I can't understand where the problem is. They are valid hexadecimal characters.

Full string (this is part of a string sent by client app to server and that was being blocked by modsec):

%61%e9%3d%36%7f%00%00%01%00%00%43%fd%a1%5a%00%00%00%43

Sample to decode:

decodeURIComponent("%61%e9%3d%36%7f%00%00%01%00%00%43%fd%a1%5a%00%00%00%43")

Error:

VM222:1 Uncaught URIError: URI malformed
    at decodeURIComponent (<anonymous>)
    at <anonymous>:1:1

I am using these two functions to encode base64 and decode from base64 (from here:Mozilla):

function c64(t) {
        return btoa(encodeURIComponent(t).replace(/%([0-9A-F]{2})/g,
                (match, p1) => {
            return String.fromCharCode('0x' + p1);
        }));
    }

function d64(t) {
        return decodeURIComponent(atob(t).split('').map(function (c) {
            return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
        }).join(''));
    }

The original string is in base64:

d64("Yek9Nn8AAAEAAEP9oVoAAABDYek9Nn8AAAEAAEP9oVoAAABD")

returns:

...js:1 Uncaught URIError: URI malformed
    at decodeURIComponent (<anonymous>)

Solution

  • This is because the unicode representation of that character in hexadecimal encoding is not "%e9" or "%E9".

    Start by typing in console: "\u00e9" or "\u00E9"

    which is % replaced by "\u00" in your example. You will get:

    'é'

    You can verify this by running:

    escape('é') //"%E9".

    Now run

    encodeURIComponent('é')

    and you will get "%C3%A9" not "%E9". This is because encodeURIComponent returns hex dump of bytes. If the character is 2 bytes you get %xx%yy, if 3 bytes you get %xx%yy%zz.

    Try this with "€". First do:

    escape("€")

    , you will get '%u20AC' or same as "\u20AC".

    To get the hex dump of its byte code run:

    encodeURIComponent("€") and you will get '%E2%82%AC'.

    This example from Wikipedia 'UTF-8' article explains in detail how '%E2%82%AC' is calculated. It is the hex dump of 11100010 10000010 10101100.