apachesecuritymod-security

Modsecurity & Apache: How to limit access rate by header?


I have both Apache and Modsecurity working together. I'm trying to limit hit rate by request's header (like "facebookexternalhit"). And then return a friendly "429 Too Many Requests" and "Retry-After: 3".

I know I can read a file of headers like:

SecRule REQUEST_HEADERS:User-Agent "@pmFromFile ratelimit-bots.txt"

But I'm getting trouble building the rule.

Any help would be really appreciated. Thank you.


Solution

  • After 2 days of researching and understanding how Modsecurity works, I finally did it. FYI I'm using Apache 2.4.37 and Modsecurity 2.9.2 This is what I did:

    In my custom file rules: /etc/modsecurity/modsecurity_custom.conf I've added the following rule:

    # Limit client hits by user agent
    SecRule REQUEST_HEADERS:User-Agent "@pm facebookexternalhit" \
        "id:400009,phase:2,nolog,pass,setvar:global.ratelimit_facebookexternalhit=+1,expirevar:global.ratelimit_facebookexternalhit=3"
    SecRule GLOBAL:RATELIMIT_FACEBOOKEXTERNALHIT "@gt 1" \
        "chain,id:4000010,phase:2,pause:300,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"
        SecRule REQUEST_HEADERS:User-Agent "@pm facebookexternalhit"
    Header always set Retry-After "3" env=RATELIMITED
    ErrorDocument 429 "Too Many Requests"
    

    Explanation:

    Note: I want to limit to 1 request every 3 seconds.

    1. The first rule matches the request header user agent against "facebookexternalhit". If the match was succesful, it creates the ratelimit_facebookexternalhit property in the global collection with the initial value of 1 (it will increment this value with every hit matching the user agent). Then, it sets the expiration time of this var in 3 seconds. If we receive a new hit matching "facebookexternalhit" it will sum 1 to ratelimit_facebookexternalhit. If we don't receive hits matching "facebookexternalhit" after 3 seconds, ratelimit_facebookexternalhit will be gone and this process will be restarted.
    2. If global.ratelimit_clients > 1 (we received 2 or more hits within 3 seconds) AND user agent matches "facebookexternalhit" (this AND condition is important because otherwise all requests will be denied if a match is produced), we set RATELIMITED=1, stop the action with a 429 http error, and log a custom message in Apache error log: "RATELIMITED BOT".
    3. RATELIMITED=1 is set just to add the custom header "Retry-After: 3". In this case, this var is interpreted by Facebook's crawler (facebookexternalhit) and will retry operation in the specified time.
    4. We map a custom return message (in case we want) for the 429 error.

    You could improve this rule by adding @pmf and a .data file, then initializing global collection like initcol:global=%{MATCHED_VAR}, so you are not limited just to a single match by rule. I didn't test this last step (this is what I needed right now). I'll update my answer in case I do.

    UPDATE:

    I've adapted the rule to be able to have a file with all user agents I want to rate limit, so a single rule can be used across multiple bots/crawlers:

    # Limit client hits by user agent
    SecRule REQUEST_HEADERS:User-Agent "@pmf data/ratelimit-clients.data" \
        "id:100008,phase:2,nolog,pass,setuid:%{tx.ua_hash},setvar:user.ratelimit_client=+1,expirevar:user.ratelimit_client=3"
    
    SecRule USER:RATELIMIT_CLIENT "@gt 1" \
        "chain,id:1000009,phase:2,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"                                                                                     
        SecRule REQUEST_HEADERS:User-Agent "@pmf data/ratelimit-clients.data"
    
    Header always set Retry-After "3" env=RATELIMITED
    
    ErrorDocument 429 "Too Many Requests"
    

    So, the file with user agents (one per line) is located inside a subdirectory under the same directory of this rule: /etc/modsecurity/data/ratelimit-clients.data. Then we use @pmf to read and parse the file (https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual-(v2.x)#pmfromfile). We initialize the USER collection with the user agent: setuid:%{tx.ua_hash} (tx.ua_hash is in the global scope in /usr/share/modsecurity-crs/modsecurity_crs_10_setup.conf). And we simply use user as collection instead of global. That's all!