Within our virtual host config, I am trying to use a RewriteCond
to check if a trailing slash request is requesting a .html
page that exists on our website. If so, 302 redirect it to the proper .html page. If not, provide 404.
Note: All of our website pages on our site end in .html
This works: https://example.com/content/go/first-level/
(302 redirects to https://example.com/content/go/first-level.html
)
This works: https://example.com/content/go/first-level/second-level
(302 redirects to https://example.com/content/go/first-level/second-level.html
)
This does not:
This does not https://example.com/content/go/first-level/second-level/third-level
(Provides 404 and remains https://example.com/content/go/first-level/second-level/third-level
)
This is because https://example.com/content/go/first-level/second-level/third-level.html
page is not actually a directory, so when I do my directory test, it fails. However, I don't think I can do the -f
test because my %{REQUEST_URI}
is going to contain the slash which will cause the .html
part to fail.
Notes: our site uses .html
extensions, so the goal of the code below is to 302 redirect (will update to 301 later) trailing and non trailing slash URLs to .html
pages and to 404 any non existent page requests with trailing and non trailing slashes.
# Handle requests to trailing slash if directory exists, add .html
# Fails for the last page in directory structure
RewriteCond %{REQUEST_URI} /content/go/.*
RewriteCond %{REQUEST_URI} /$
RewriteCond %{REQUEST_URI} !.*.json$
RewriteCond %{REQUEST_URI} !.*.sjson$
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} -d
RewriteRule ^(.*)/$ $1.html [L,R=302]
# Handle requests to trailing slash if directory does not exist, 404
RewriteCond %{REQUEST_URI} /content/go/.*
RewriteCond %{REQUEST_URI} /$
RewriteCond %{REQUEST_URI} !.*.json$
RewriteCond %{REQUEST_URI} !.*.sjson$
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} !-d
RewriteRule ^(.*?)$ $1 [L,R=404]
# Handle non trailing slash if page exists, add .html
# Working
RewriteCond %{REQUEST_URI} /content/go/.*
RewriteCond %{REQUEST_URI} !.*.json$
RewriteCond %{REQUEST_URI} !.*.sjson$
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}\.html -f
RewriteRule ^(.*) $1.html [L,R=302]
This works:
https://example.com/content/go/first-level/
(302 redirects tohttps://example.com/content/go/first-level.html
)
But why should this be dependent on whether /content/go/first-level/
exists as a directory and not whether the file first-level.html
itself exists?
This is because
https://example.com/content/go/first-level/second-level/third-level.html
page is not actually a directory, so when i do my directory test, it fails.
Presumably you mean ../third-level
is not a directory, not ../third-level.html
(the intended file target).
(Aside: You should avoid having filesystem directories and files with the same basename when dealing with extension-less requests since there is an inherent conflict that can take additional steps to overcome.)
# Handle requests to trailing slash if directory exists, add .html # Fails for the last page in directory structure RewriteCond %{REQUEST_URI} /content/go/.* RewriteCond %{REQUEST_URI} /$ RewriteCond %{REQUEST_URI} !.*.json$ RewriteCond %{REQUEST_URI} !.*.sjson$ RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} -d RewriteRule ^(.*)/$ $1.html [L,R=302]
I'm not sure why you are doing a "directory test" at all? Or why /content/go/first-level/second-level
should map to a directory in order to redirect to /content/go/first-level/second-level.html
? (And not actually checking that the .html
file exists?) Checking whether the request maps to a directory does not appear to be part of your stated requirements?
check if a trailing slash request is requesting a .html page that exists on our website. If so, 302 redirect it to the proper .html page. If not, provide 404.
This would only seem to require 1 rule. Requests with or without a trailing slash can be handled by the same rule. You don't need a separate rule to trigger a 404, since that should happen by default.
If I understand your question correctly, a request for /content/go/path/to/file
(no trailing slash) or /content/go/path/to/file/
(with a trailing slash) should be 302 redirected to /content/go/path/to/file.html
if that file exists.
If /content/go/path/to/file
maps to a directory then so be it. However, if that /content/go/path/to/file.html
exists then that will take priority.
I'm assuming this is to be used directly in the <VirtualHost>
container and not in a <Directory>
section inside that virtual host.
As mentioned this only requires 1 rule, for example:
# Redirect to ".html" file if it exists (handles optional trailing slash)
RewriteCond %{REQUEST_URI} !\.(html|json|sjson)$
RewriteCond %{DOCUMENT_ROOT}$1.html -f
RewriteRule ^(/content/go/.+?)/?$ $1.html [R=302,L]
Explanation:
In the RewriteRule
pattern ^(/content/go/.+?)/?$
, the capturing subpattern is non-greedy (as determined by the +?
quantifier) so this never includes the optional trailing slash. So, the $1
backreference does not include a trailing slash. This also includes the check for /content/go/
prefix, negating the need for an additional condition. (Remember the RewriteRule
pattern is processed first, before all the preceding conditions.)
The first condition (RewriteCond
directive) then excludes requests that already end with .html
(or .json
or .sjson
).
The second condition then checks that the target .html
file exists before redirecting to it.
Any request to /content/go/path/to/file
that does not map to a .html
file and does not map to a directory will naturally 404. If it does map to a directory then you'll get a 403, unless you have a directory index document to handle the request.
# Handle requests to trailing slash if directory exists, add .html # Fails for the last page in directory structure RewriteCond %{REQUEST_URI} /content/go/.* RewriteCond %{REQUEST_URI} /$ RewriteCond %{REQUEST_URI} !.*.json$ RewriteCond %{REQUEST_URI} !.*.sjson$ RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} -d RewriteRule ^(.*)/$ $1.html [L,R=302]
The check for /content/go/
should be handled by the RewriteRule
pattern. The .*
at the end of this pattern is superfluous (regex match anywhere by default). There is no anchor at the start of the regex, so this matches /content/go/
anywhere in the URL-path.
The second condition that checks for the trailing slash is superfluous since you have already ascertained that a trailing slash is present in the RewriteRule
pattern.
!.*.json$
- The literal dot should be backslash-escaped and the .*
is superfluous (as mentioned above). However, these two checks are superfluous since you have already ascertained that the requested URL ends with a slash, so these two negated conditions will always be successful.
The REQUEST_URI
server variable includes the slash prefix, so the expression %{DOCUMENT_ROOT}/%{REQUEST_URI}
will result in a double slash when these variables are expanded. This double slash will ultimately be resolved away when the filesystem check occurs, so it should still "work". However, you have correctly omitted the slash separator in the last/3rd rule.
However, as mentioned at the top of my answer, I don't see why you would "blindly" redirect to a .html
file (which may or may not exist) if the original request happens to map to a directory?
# Handle requests to trailing slash if directory does not exist, 404 RewriteCond %{REQUEST_URI} /content/go/.* RewriteCond %{REQUEST_URI} /$ RewriteCond %{REQUEST_URI} !.*.json$ RewriteCond %{REQUEST_URI} !.*.sjson$ RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} !-d RewriteRule ^(.*?)$ $1 [L,R=404]
When you trigger a 404 (ie. R=404
) then the substitution string (2nd argument to the RewriteRule
directive) is ignored. In this case, you should simply use -
(hyphen) as the substitution to explicitly indicate "no substitution".
(.*?)
- the non-greedy capture is not serving any purpose here. .*
would do the same (and arguably more efficient).
However, I'm not sure why you need to trigger a 404 for any request with a trailing slash that does not map to a directory? This will happen by default. (But you presumably want to serve the corresponding .html
file in this scenario?)
# Handle non trailing slash if page exists, add .html # Working RewriteCond %{REQUEST_URI} /content/go/.* RewriteCond %{REQUEST_URI} !.*.json$ RewriteCond %{REQUEST_URI} !.*.sjson$ RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}\.html -f RewriteRule ^(.*) $1.html [L,R=302]
But URLs with a trailing slash that don't map to a directory but do map to a .html
trigger a 404 (by the 2nd rule above)?
No need to backslash-escape the literal dot in the TestString on that last condition since this carries no special meaning here.
All canonical requests for /content/go/<whatever>.html
will also be unnecessarily processed by this rule, which will ultimately fail (unless you happen to have files with a double html
extension). You should exclude requests that already end with .html
before the filesystem check. (Filesystem checks are relatively expensive so should be avoided where possible.)