Users provide both properly escaped URLs and raw URLs to my website in a text input; for example I consider these two URLs equivalent:
https://www.cool.com/cool%20beans
https://www.cool.com/cool beans
Now I want to render these as <a>
tags later, when viewing this data. I am stuck between encoding the given text and getting these links:
<a href="https://www.cool.com/cool%2520beans"> <!-- This one is broken! -->
<a href="https://www.cool.com/cool%20beans">
Or not encoding it and getting this:
<a href="https://www.cool.com/cool%20beans">
<a href="https://www.cool.com/cool beans"> <!-- This one is broken! -->
What's the best way out from a user experience standpoint with modern browsers? I'm torn between doing a decoding pass over their input, or the second option I listed above where we don't encode the href
attribute.
If you want to avoid double encoding the links you can just use urldecode()
on both links, and then urlencode()
afterwards, as decoding a URL such as "https://www.cool.com/cool beans" would return the same value, whereas decoding "https://www.cool.com/cool%20beans" would return with the space. This leaves both links free to be encoded properly.
Alternatively, encoded characters could be scanned for using strpos()
function, e.g.
if ($pos = strpos($url, "%20") {
//Encoded character found
}
Ideally for this an array of common encoded characters would be scanned for, in the place of the "%20"