apache.htaccessmod-rewriteurl-rewritingurl-masking

URL masking not working for URLs that omit the trailing slash


There are a lot of similar questions, but none seem to be the exact fit for me.

I am moving away from a WordPress site to a simple static site. However, I am currently prohibited from removing the WordPress site hosted in the public_html folder completely until everything is proven to be working with the static site.

I have the static site deployed to a sub-sub folder in my public_html folder e.g. /subfolderA/newSiteFolder.

I have updated the .htaccess to redirect to the sub sub folder using the following:

RewriteEngine on
RewriteCond %{REQUEST_URI} !newSiteFolder/ 
RewriteCond %{REQUEST_URI} !subfolderA/newSiteFolder/ 
RewriteRule (.*)$ /subfolderA/newSiteFolder/$1 [L] 

This works fine and shows properly in the address bar when navigating the site by pressing links from within the site, however when navigating to the site from an external link, the subfolders are shown in the address bar.

For example, if the about page is clicked from an external link, it shows as https://example.com/subfolderA/newSiteFolder/about, instead of https://example.com/about.

How can I mask the sub folder names in the address bar when clicked from an external link? Or how best to change my rewrite rules to accomplish this?


Solution

  • I'm assuming that about is actually a physical subdirectory at /subfolderA/newSiteFolder/about and you are intending to serve the DirectoryIndex document (eg. index.html) from that directory.

    The "problem" is that when you request a directory without a trailing slash mod_dir attempts to "fix" this by appending a trailing slash via a 301 (permanent) redirect and this is exposing the file-path that has been internally rewritten to.

    In other words, when you request /about (no trailing slash), your mod_rewrite directives internally rewrite the request to /subfolderA/newSiteFolder/about, but then mod_dir kicks in and externally redirects the request to /subfolderA/newSiteFolder/about/ to append the trailing slash (which is required).

    The canonical URL contains the trailing slash and this is what you are linking to internally. So we need to make sure there is always a trailing slash on the rewritten URL when this maps to a directory. We can do this with a canonical redirect before we rewrite the URL.

    RewriteCond %{REQUEST_URI} !newSiteFolder/ 
    RewriteCond %{REQUEST_URI} !subfolderA/newSiteFolder/ 
    RewriteRule (.*)$ /subfolderA/newSiteFolder/$1 [L]
    

    The first conditon would seem to be superfluous. But also, the regex used here are not anchored so are matching the stated URL anywhere in the requested URL-path.

    However, we can't just append the trailing slash to all URLs, since you likely have static resources like CSS, JS and images etc. For any static files we must not force a trailing slash, so we need to handle this with an additional rule. Try the following instead:

    # Store the base directory in an environment variable
    RewriteRule ^ - [E=BASEDIR:/subfolderA/newSiteFolder/]
    
    # Rewrite the root (homepage) only
    RewriteRule ^$ %{ENV:BASEDIR} [L]
    
    # Finish early if we are already in the required base directory
    RewriteCond %{ENV:BASEDIR}@%{REQUEST_URI} ^([^@]+)@\1
    RewriteRule ^ - [L]
    
    # If the request would map to a directory
    #     and it is missing a trailing slash
    #     then redirect to append the trailing slash
    RewriteCond %{REQUEST_URI} !\.\w{2,4}$
    RewriteCond %{DOCUMENT_ROOT}%{ENV:BASEDIR}$1 -d
    RewriteRule ^(.+[^/])$ /$1/ [R=301,L]
    
    # Rewrite everything to the base directory
    RewriteRule (.+) %{ENV:BASEDIR}$1 [L]
    

    Explanation of the above directives

    I have chosen to store the "base directory" (ie. /subfolderA/newSiteFolder/) in an environment variable BASEDIR using the first rule to save repetition of the base file-path throughout the file.

    RewriteCond %{ENV:BASEDIR}@%{REQUEST_URI} ^([^@]+)@\1
    

    This condition checks whether the requested URL (including the rewritten URL) is already inside the base directory being rewritten to. The @ character is just an arbitrary character that does not appear in the URL-path, it carries no special meaning in the regex, other than delimiting the base directory (BASEDIR) from the requested URL (REQUEST_URI). \1 is an internal backreference to check whether the requested URL starts with the base directory.

    RewriteCond %{REQUEST_URI} !\.\w{2,4}$
    RewriteCond %{DOCUMENT_ROOT}%{ENV:BASEDIR}$1 -d
    RewriteRule ^(.+[^/])$ /$1/ [R=301,L]
    

    The first condition excludes any request that ends in what looks-like a file extension (ie. a dot followed by between 2 and 4 characters), so we can avoid the more expensive directory check (that follows). This does assume that you don't have physical directories that end with what looks-like a "file extension".

    The second condition tests whether the requested URL (eg. /about) exists as a directory inside the directory being rewritten to.

    The regex ^(.+[^/])$ matches (and captures) any URL-path that does not already end in a slash.

    NB: You need to make sure you have cleared your browser cache before testing since the earlier erroneous redirect to append the trailing slash (that also exposed the file-path) was a 301 permanent redirect and will likely have been cached persistently by the browser.


    Prevent direct access to the "hidden" subdirectory

    Is there a way to also fix the URL for a user who was previously navigated to mydomain/subfolderA/newSiteFolder/about from the external link and saved the link with the subfolders, and is now using that link directly?

    You can prevent direct access to this "hidden" subdirectory and redirect the user back to the "canonical" URL with something like the following. This should go as the 3rd rule in the above block, after the "Rewrite the root ..." rule.

    # Redirect direct requests to the subdirectory back to root
    RewriteCond %{ENV:REDIRECT_STATUS} ^$
    RewriteCond %{ENV:BASEDIR}@%{REQUEST_URI} ^([^@]+)@\1(.*)
    RewriteRule ^ /%2 [R=301,L]
    

    Importantly, the first condition that checks against the REDIRECT_STATUS env var excludes rewritten requests by the later rewrite, so this rule only affects direct requests from the client.

    %2 is a backreference to the 2nd captured group in the preceding CondPattern, ie. everything in the URL-path after the BASEDIR.

    HOWEVER, if the user has previously been erroneously redirected to the subdirectory then this redirect will have likely been cached by the browser, so the above redirect to remove (undo) the subdirectory may result in a redirect-loop for these users unfortunately until they clear their browser cache. (This redirect-loop might prompt them to try and clear their browser cache to resolve the issue; although maybe not.)

    You could perhaps redirect back to a URL that contains an innocuous query string. This might be enough to prevent a redirect loop for those users that have the erroneous redirect cached (since it's not a URL in their cache), but it does leave a superfluous query string hanging on the URL. For example, change the above RewriteRule directive:

    :
    RewriteRule ^ /%2?noredirect [R=301,L]
    

    noredirect is just any query string to differentiate from the cached URL/redirect.

    NB: Test first with a 302 (temporary) redirect to avoid further/potential caching issues.

    Summary

    RewriteEngine On
    
    # Store the base directory in an environment variable
    RewriteRule ^ - [E=BASEDIR:/subfolderA/newSiteFolder/]
    
    # Rewrite the root (homepage) only
    RewriteRule ^$ %{ENV:BASEDIR} [L]
    
    # Redirect direct requests to the subdirectory back to root
    RewriteCond %{ENV:REDIRECT_STATUS} ^$
    RewriteCond %{ENV:BASEDIR}@%{REQUEST_URI} ^([^@]+)@\1(.*)
    RewriteRule ^ /%2 [R=301,L]
    
    # Finish early if we are already in the required base directory
    RewriteCond %{ENV:BASEDIR}@%{REQUEST_URI} ^([^@]+)@\1
    RewriteRule ^ - [L]
    
    # If the request would map to a directory
    #     and it is missing a trailing slash
    #     then redirect to append the trailing slash
    RewriteCond %{REQUEST_URI} !\.\w{2,4}$
    RewriteCond %{DOCUMENT_ROOT}%{ENV:BASEDIR}$1 -d
    RewriteRule ^(.+[^/])$ /$1/ [R=301,L]
    
    # Rewrite everything to the base directory
    RewriteRule (.+) %{ENV:BASEDIR}$1 [L]