apacheencodingurl-rewritingdirectoryapache2

URL Encoding with Underscores in a Directory Name?


We've run into an odd argument where I work, and I may be wrong on this, so this is why I am asking.

Our software outputs a directory to an Apache server that replaces an underscore with a %5F in the name of the directory.

For instance if the name of the directory was listed as a string in our software it would be: "andy_test", but then when the software outputs the directory to the Apache server, it would become "andy%5Ftest". Unfortunately, when you access the url on the server it ends up becoming "andy%255Ftest".

Somehow this seems wrong to me, once again the progression is:

  1. andy_test <- (as a string in the software)
  2. andy%5Ftest <- (listed as a directory on the server)
  3. andy%255Ftest <- (must be used when calling the same directory as a URL on the server from a web browser.)

I'm assuming that "%5" is encoding for underscore, and that "%25" is encoding for "%".

Now it would seem to me that the way that the directory name should be listed on the server would be just plain andy_test and if you were using an encoded URI then maybe you would end up with the "andy%5Ftest" to access the directory on the apache server.

I asked the guys on the backend about it, and they said that they were just: "encoding anything that was not a letter or a number.

So I guess I'm a bit confused on this. Can you tell me who is right, and direct me to some information on why?


Solution

  • You should not encode the directory names as you create them (as you suggested). Encoding should only happen at the last stage where it is handed out to the browser. That's why you are ending up with 'double' encoding: %25 is % and 5F is the leftover from the first encoding of underscore.

    Also, note that you don't need to encode underscores according to RFC 1738.

    2.2. URL Character Encoding Issues

    ...

    Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.