.htaccessmod-rewriteurl-rewritingrequest-uri

REQUEST_URI not matching explicit path and filename


Really stumped, because form and syntax seem fine.

RewriteCond for REQUEST_URI is not matching the explicit path and filename. When isolated, RewriteCond for REQUEST_FILENAME matches just fine. I have verified using phpinfo() that REQUEST_URI contains the leading slash, and have tested without the leading slash, also.

The goal here is to know that the request is for this file and, if it doesn't exist, then throw a 410.

RewriteCond %{REQUEST_URI} ^/dir1/dir2/dir3/v_9991_0726dd5b5e8dd67a214c0c243436d131_all\.css$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ - [R=410,L]

I don't want to omit the first Cond, because I only want to do this for a handful of files similar to this one.

UPDATE I

trying to get a definitive test. Test set-up:

RewriteCond %{REQUEST_URI} ^/testmee\.txt$
#RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ - [R=410,L]

versus

#RewriteCond %{REQUEST_URI} ^/testmee\.txt$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ - [R=410,L]

UPDATE II

Response for MrWhite:

ughh, same symptom. Might have to live with googlebot hitting 404s instead of a desired 410 for outdated css/js. No biggie in the long run, probably.

Thank you for that request_uri test redirect. Everything is working normally in those tests. Page names, etc. are returned as expected, in the var= rewrite URL.

At this point, I think it must be some internal handling of 404s related to the file type extensions. See clue below. I have Prestashop shopping cart software, and it must be forcing 404s on file types.

This will redirect to google (to affirm pattern match):

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^testmee\.txt$ http://www.google.com/ [L]
(L flag is needed or else other Rules further down will interfere.)

This will continue to return 404 instead of 410:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^testmee\.txt$ - [NC,R=410]

And as a control test, this will return a 410:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^.*$ - [NC,R=410]

If file type is css in the above failed test, then my custom 404 controller does not get invoked. I just get a plain 404 Response, w/o the custom 404 that is wrapped with all my site templating.

For example:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^testmee\.css$ - [NC,R=410]

I'm afraid I've wasted some of your time. My apologies. I never imagined that Prestashop's code would be forcing 404 based on file type, but I can't see any other explanation. I could dig into it and maybe find the spot in the Controllers that is doing it. Gotta take a break, though.


Solution

  • This isn't really a solid answer, more of a things to try to help debug this and to quash some myths...

    I have verified using phpinfo() that REQUEST_URI contains the leading slash

    Yes, the REQUEST_URI Apache server variable does indeed contain the leading slash. It contains the full URL-path.

    However, the REQUEST_URI Apache server variable is not necessarily the same as the $_SERVER['REQUEST_URI'] PHP superglobal - in fact, they aren't really the same thing at all. There are some significant differences between these variables (in some ways it's perhaps a bit unfortunate they share the same name). Notably, the PHP superglobal contains the initial URL from the request and includes the query string (if any) and is not %-decoded. Whereas the Apache server variable of the same name contains the rewritten URL (not necessarily the requested URL) and does not contain the query string and is %-decoded.

    So, that's why I was asking whether you have other mod_rewrite directives. You could very well have had a conflict. If another directive rewrites the URL, then the condition will never match (despite the PHP superglobal suggesting that it should).

    It seemed that if I put this at the top, the Last flag would end processing for that trip through, return the 410

    This directive should certainly go at the top of the .htaccess file, to avoid the URL being rewritten earlier. The L flag is actually superfluous when used with a R=410 (anything other than a 3xx) - it is implied in this case.

    Then I change the result to be "throw a 410" and it throws a 404.

    That can certainly be caused by a server-side override. But you are able to throw a 410 in other situations, so that would seem to rule that out. However, you can reset the error document in .htaccess if in doubt (unless you are already using a custom error document):

    ErrorDocument 410 default
    
    RewriteCond %{REQUEST_URI} ^/dir1/dir2/dir3/v_9991_0726dd5b5e8dd67a214c0c243436d131_all\.css$
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^(.*)$ - [R=410,L]
    

    Whilst this doesn't really make a difference to how the rule behaves, you don't need the first RewriteCond directive that checks against the REQUEST_URI. You should be doing this check in the RewriteRule pattern instead (which will be more efficient, since this is processed first). For example:

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^dir1/dir2/dir3/v_9991_0726dd5b5e8dd67a214c0c243436d131_all\.css$ - [NC,R=410]
    

    The NC flag should be superfluous.

    Still, a conflict with existing directives is the most probable cause. Remove all other directives. Do you still see the same behaviour?


    You can test the value of the REQUEST_URI server variable. You could either issue a redirect and pass the REQUEST_URI as a URL parameter, or set environment variables (but you will need to look out for REDIRECT_<var> for each rewrite).

    For example, at the top of your .htaccess (or wherever you are trying this):

    RewriteCond %{QUERY_STRING} ^$
    RewriteRule ^ /test.php?var=%{REQUEST_URI} [NE,R,L]
    

    Created a dummy test.php file to avoid an internal subrequest to an error document.