My Express server cannot handle requests with the windows-1252 character set because Express router puts the path in the decodeURIComponent()
method and throws an error. My idea now is to call the decodeURIComponent()
method myself in a middleware, catch the error and then try to work with a replacement. I have already found a corresponding table in section ASCII Encoding Reference in https://www.w3schools.com/tags/ref_urlencode.ASP
My first question is, is this possible?
My second question is, is there a package for this? I have to rely on require
statements, I can't include packages with import
.
An example path
/pictures/M%F6bel.png
should convert to /pictures/M%C3%B6bel.png
so that the rest of the express logic can remain untouched.
What type of client would send an invalid URL like /pictures/M%F6bel.png
? Anyway:
var path = "/pictures/M%F6bel.png"
.replace(/%u(....)/g, (m,p)=>String.fromCharCode("0x"+p))
.replace(/%(..)/g, (m,p)=>String.fromCharCode("0x"+p))
and encodeURI(path)
then gives the percent-encoded version à la utf-8.
This allows you to write a middleware like
app.use(function(req, res, next) {
try {
decodeURI(req.url);
} catch(e) {
if (e.message === "URI malformed")
req.url = encodeURI(req.url
.replace(/%u(....)/g, (m,p)=>String.fromCharCode("0x"+p))
.replace(/%(..)/g, (m,p)=>String.fromCharCode("0x"+p)));
}
next();
});
Note that changing req.url
in this way automatically changes req.path
as well. By contrast, req.originalUrl
remains unchanged (as its name might suggest).
This does not work on characters that are defined in Windows-1252 but not in Unicode (like the range 0x80-0x9F). Such characters are not allowed in URLs.