apache.htaccesshttp-redirectmod-rewrite

How to make URL not case-sensitive (i.e all case variations of a page URL link to the actual page)


I'm new to this site. If someone could please help I will be forever grateful.

I already have this code in my .htaccess file:

RewriteEngine On
RewriteCond %{THE_REQUEST} \s/([^.]+)\.html [NC]
RewriteRule ^ /%1 [R=301,L]
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^([0-9a-zA-Z_-]+)$ $1.html [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^.*$ / [R=301,L]
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://%1/$1 [L,R=301]

What this does is if you type in "website.com/page.html" it will change the URL to "website.com/page" and it also redirects any random page URL that is not a page to the homepage.

How would I be able to make it so that if you type any case variation it will point to the page file? For example: type in "website.com/pAgE" it will redirect to "website.com/page" and not redirect to the homepage. Plus if some of my code is silly to do the other stuff please tell me.

Thanks

I tried to add:

RewriteCond %{REQUEST_URI} \[A-Z\]
RewriteRule ^(.\*)$ /${tolower:$1} \[R=301,L\]

but it seems to break the site.


Solution

  • I tried to add:
    
    RewriteCond %{REQUEST_URI} \[A-Z\]
    RewriteRule ^(.\*)$ /${tolower:$1} \[R=301,L\]
    

    I assume those backslashes are typos (an attempt at formatting?) in your question and not part of your actual code?!

    The tolower rewritemap is only available if you have already configured this in the server config, which I assume you have not done?

    However, on Apache 2.4+ there is a tolower function that you can use directly in .htaccess, so the rewritemap is not required (as it would be on earlier versions of Apache).

    RewriteCond %{THE_REQUEST} \s/([^.]+)\.html [NC]
    RewriteRule ^ /%1 [R=301,L]
    RewriteCond %{REQUEST_FILENAME}.html -f
    RewriteRule ^([0-9a-zA-Z_-]+)$ $1.html [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^.*$ / [R=301,L]
    RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
    RewriteRule ^(.*)$ https://%1/$1 [L,R=301]
    

    However, there are other issues with your existing code.

    If all your filenames are lowercase then you could simply convert (ie. "redirect") everything (providing it contains an uppercase letter) to the lowercase'd URL regardless of whether it is a "page file" or not. Note that this does not strictly make your URLs "case-insensitive" (which is arguably bad for SEO), it is simply an uppercase to lowercase redirect.

    Try the following instead:

    RewriteEngine On
    
    ErrorDocument 404 /error-docs/my-custom-404.html
    
    # www to non-www canonical redirect
    RewriteCond %{HTTP_HOST} ^www\.(.+) [NC]
    RewriteRule ^(.*)$ https://%1/$1 [R=301,L]
    
    # Convert all URLs to lowercase
    RewriteCond expr "tolower(%{REQUEST_URI}) =~ /(.*)/"
    RewriteRule [A-Z] %1 [R=301,L]
    
    # Remove ".html" extension only if this maps to a real file (in root only)
    RewriteCond %{ENV:REDIRECT_STATUS} ^$
    RewriteCond %{REQUEST_FILENAME} -f
    RewriteRule ^([\w-]+)\.html$ /$1 [R=301,L]
    
    # Append ".html" if this matches a real file (in root only)
    RewriteCond %{DOCUMENT_ROOT}/$1.html -f
    RewriteRule ^([\w-]+)$ $1.html [L]
    

    And create a /error-docs/my-custom-404.html file with your friendly "404 Not Found" custom error page with links to the homepage and elsewhere. (Taking this a step further, you can analyse the URL, check for typos etc. and suggest pages that the user perhaps intended to visit, etc.)

    The use of the REDIRECT_STATUS environment variable is a cleaner method (IMO) to avoid a redirect loop and errors in the regex than using THE_REQUEST. (Your existing regex was not correct since it would potentially match the query string as well, resulting in malformed redirects.)

    Based off your original rules, this only works for files in the document root, not subdirectories. I assume this is intentional.

    The regex character class [\w-] uses the \w shorthand character class and is the same as the more verbose [0-9a-zA-Z_-].

    Make sure you clear your browser cache before testing and test first with 302 (temporary) redirects to avoid potential caching issues.