apache.htaccessmod-rewrite

htaccess redirect block query string url


I'm working on a WordPress site that was apparently hacked some time ago. The intruder added a redirect plugin and has been happily redirecting traffic to spammy sites for a while. I removed the plugin and dropped the DB tables associated with the redirect plugin. The problem now is that all of the spam links are generating lots of WP requests, and it's taxing the server in CPU and memory usage.

I did some research here about blocking URLs that contain query strings. The query strings were like this: example.com/?o=123456. I would like to reduce the load on the server, and it seems like a RewriteCond with query string and a RewriteRule with a R=404 (Thanks Mr. White) should do the trick, letting Apache do the heavy lifting instead of WordPress.

Here's what I have:

RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} ^o=([0-9-]+)$ [NC]
RewriteRule ^404\.html$ https://example.com [R=404]

It occurs to me that maybe there are other letters in place of the o, so I have this also:

RewriteCond %{QUERY_STRING} ^([a-z])=([0-9-]+)$ [NC]
RewriteRule ^404\.html$ https://example.com [R=404]

Does this look like it would work properly? I have tested it on my development server but I'm just not sure what it should look like if it's working and if it's not.

More importantly, is this the best approach?


Solution

  • I have tested it on my development server but I'm just not sure what it should look like if it's working and if it's not.

    You should be getting an Apache generated 404 "Not Found" response. Unless you have an ErrorDocument 404 ... directive that sends 404s back to WordPress - which you don't want to happen. However, a 404 is not necessarily the best response in this case. A "410 Gone" would be better. A 410 tells user-agents (and importantly search engines) the page is never coming back and search engines should drop the URL quicker if it has been indexed (although unlikely if this was an intermediary redirect).

    It occurs to me that maybe there are other letters in place of the o, so I have this also:

    You certainly don't need both, as the second rule (that checks any letter) is just a more generalised version of the first.

    RewriteCond %{QUERY_STRING} ^([a-z])=([0-9-]+)$ [NC]
    RewriteRule ^404\.html$ https://example.com [R=404]
    

    Does this look like it would work properly?

    No, not according to your description. The condition (RewriteCond directive) looks OK (although no need for the capturing subpatterns), but the RewriteRule is checking for requests to /404.html (eg. /404.html?o=123456), not example.com/?o=123456 as you stated in the question. The first argument to the RewriteRule directive is the URL-path being requested (which can be rewritten by other rules). (What was the reasoning behind using ^404\.html$ here? Is this the file you want to serve? Although arguably better to serve the Apache default response in this case.)

    When specifying a non-3xx R code the substitution string (2nd argument to the RewriteRule directive) is ignored. So https://example.com is not doing anything here. To formerly indicate no-substitution (as in this case) then simply use a - (hyphen) as the substitution.

    Note that the regex character class [0-9-] also matches literal hyphens - is that intentional? (Again, the query string in your example is digits only.)

    This rule also needs to go at the very top of the root .htaccess file, before any other WordPress directives. You do not need to repeat the RewriteEngine directive (since that already occurs later in the file - inside the WordPress code block - and it is the last occurrence of this directive that controls the entire file). Likewise, you should not repeat the RewriteBase directive either - it is not being used here anyway.

    Try the following instead:

    # Make sure 410 Gone's serve the Apache default
    # OR set to a "basic" custom response eg. "ErrorDocument 410 /410.html"
    ErrorDocument 410 default
    
    # Serve a 410 for any request of the form "/?<letter>=<digits/hyphen>"
    RewriteCond %{QUERY_STRING} ^[a-z]=[\d-]+$ [NC]
    RewriteRule ^$ - [R=410]
    

    \d (shorthand character class) is the same as [0-9].

    I've removed the parentheses ((..)) around the subpatterns in the regex since these values do not need to be captured here (as mentioned above).

    Note that the use of the NC (nocase) flag on the condition obviously allows for A-Z (uppercase) as well.

    The regex ^$ (an empty string) checks for requests to the root directory only (as in your example).

    More importantly, is this the best approach?

    It avoids these requests being processed by WordPress, which I assume is what's causing the high CPU/memory usage. However, it does not prevent the requests from reaching your server, so it can still impact performance if there are many. Ideally, any such requests would be blocked at the firewall level to prevent the request even reaching your application server.