.htaccessmod-rewriteurl-rewritingescaping

URL rewrite with an unencoded hash (#) character and an encoded space (%20)


I am looking to create a rewrite file (.htaccess) for the National Library of Medicine (NLM) Classification, which uses URLs of this form (https://classification.nlm.nih.gov/schedules/QS#QS%2023) to refer to portions of the classification.

I originally produced a file like this:

RewriteEngine On
RewriteBase /

# Redirect URLs.
RewriteRule ^([A-Z]{1,2})\ ([\d\.A-Z]+)$ https://classification.nlm.nih.gov/schedules/$1#$1%20$2 [R=301,NE,L]

for the pull request located here.

Unfortunately, this redirects URLs such as https://w3id.org/NLM/QV%20268.5 to https://classification.nlm.nih.gov/schedules/QV#QV0268.5 instead of https://classification.nlm.nih.gov/schedules/QV#QV%20268.5. I dig some digging and it seems the issue is the "NE" argument. But trying some variations it appears I need to keep the "#" unencoded and make sure the space (%20) is encoded-- so with or without the "NE" argument, it still fails.

It appears a similar question was asked here but there is currently no solution. Essentially: is there a way I can keep the "#" unencoded and encode the space (%20) or include a space in the substitution?


Solution

  • The issue is not the NE flag.

    You need to backslash escape the literal % in the substitution string, otherwise %2 will be seen as a backreference (mod_rewrite), which in this case is always empty, so it's substituted with an empty string.

    In other words, use \%2 instead of %2 in the substitution string.

    Note that 301s are cached persistently by the browser so you need to make sure all caches are cleared before testing. Test with 302s first to avoid potential caching issues.