When setting the pathname of a URL
, when should you encode the value you are setting it to?
When I say URL
I mean this API: https://developer.mozilla.org/en-US/docs/Web/API/URL
When I say "setting the pathname" I mean to do this:
url.pathname = 'some/path/to/a/resource.html';
Based on the MDN documentation, I would think the answer is "you shouldn't need to", as there is an example covering this case:
URLs are encoded according to the rules found in RFC 3986. For instance:
url.pathname = 'démonstration.html'; console.log(url.href); // "http://www.example.com/d%C3%A9monstration.html"
However, I have run into a case where it seems I do need to encode the value I am setting pathname
to:
url.pathname = 'atest/New Folder1234/!@#$%^&*().html';
console.log(url.href);
I would expect this to output:
http://example.com/atest/New%20Folder1234/!%40%23%24%25%5E%26*().html
But instead I am getting:
https://example.com/atest/New%20Folder1234/!@%23$%^&*().html
It seems to get what I expect I have to do:
url.pathname = 'atest/New Folder1234/!@#$%^&*()'.split('/').map(encodeURIComponent).join('/')
What is going on here? I cannot find anything on the MDN doc page for either URL
or pathname
that explains this. I took quick look through RFC 3986, but that just seems to describe the URI syntax. I have run some experiments in an effort to find some sort of pattern to this problem, but nothing is standing out to me.
See the specification for path state, in particular...
UTF-8 percent-encode c using the path percent-encode set and append the result to buffer.
with the path percent-encode set being defined as...
the query percent-encode set and U+003F (?), U+0060 (`), U+007B ({), and U+007D (}).
and the query percent-encode set being...
the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>).
you can keep diving down the rabbit-hole if you want but I feel that's enough
Note that none of these sets include @$%^&
which are the characters you pointed out.
Compare these to the specification for Encode which is much more thorough.