mod-rewriteapache2mod-headersvary

mod_rewrite not sending Vary: Accept-Language when RewriteCond matches


I have a rewrite rule which redirects to / if no Accept-Language header is present and someone attempts to visit ?lang=en. It works fine, except for the headers returned. Vary: Accept-Language is missing from the response.

RewriteCond %{HTTP:Accept-Language} ^$  
RewriteCond %{QUERY_STRING}         ^lang=en  
RewriteRule ^$                      http://www.example.com/?     [R=301,L]

The Apache documentation specifies:

If a HTTP header is used in a condition this header is added to the Vary header of the response in case the condition evaluates to to true for the request. It is not added if the condition evaluates to false for the request.

The conditions are definitely matching and redirecting, so I don't understand why Apache isn't adding the language vary. One can see why this would be a real problem if a proxy were to cache that ?lang=en and always redirect to / regardless of the Accept-Language header sent.


Solution

  • After peeking into the seedy underbelly of Apache's request handling system, it turns out that the documentation is somewhat misleading...But before I get into the explanation, from what I can tell you're at the mercy of Apache on this one.

    The Client Problem

    First, the header name will not be added to the Vary response header if it is not sent by the client. This is due to how mod_rewrite constructs the value for that header internally.

    It looks up the header by name using apr_table_get(), the request's header table, and the name that you provided:

    const char *val = apr_table_get(ctx->r->headers_in, name);
    

    If name is not a key in the table, this function will return NULL. This is a problem, because immediately after this is a check against val:

    if (val) {
       // Set the structure member ctx->vary_this
    }
    

    ctx->vary_this is used on a per-RewriteCond basis to accumulate header names that should be assembled into the final Vary header*. Since no assignment or appending will occur if there is no value, a referenced (but not sent) header will never appear in Vary. The documentation doesn't explicitly state this, so it may or may not have been what you expected.

    *As an aside, the NV (no vary) flag and ignore-on-failure functionality is implemented by setting ctx->vary_this to NULL, preventing its addition to the response header.

    However, it's possible that you sent Accept-Language, but it was blank. In this case, the empty string will pass the above check, and the header name will be added to Vary by mod_rewrite from what's described above. Keeping this in mind, I used the following request to diagnose what was going on:

    User-Agent: Fiddler
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
    Accept-Language: 
    Accept-Encoding: gzip,deflate
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
    Keep-Alive: 115
    Connection: keep-alive
    Host: 129.168.0.123
    

    This doesn't work either, but why? mod_rewrite definitely sets the headers when the rule and condition match (ctx->vary is an aggregate of ctx->vary_this across all checked conditions):

    if (ctx->vary) {
        apr_table_merge(r->headers_out, "Vary", ctx->vary);
    }
    

    This can be verified with a log statement, and r->headers_out is the variable used when generating the response headers. Given something is definitely going wrong though, there must be trouble after the rules are executed.

    The .htaccess Problem

    Currently, you appear to be defining your rules in .htaccess, or a <Directory> section. This means that mod_rewrite is operating in Apache's fixup phase, and the mechanism it uses to actually perform rewrites here is very messy. Let's assume for a second there's no external redirection, since you had problem a even without it (and I'll get to the issue with the redirect later).

    After you perform a rewrite, it's far too late in the request processing for the module to actually map to a file. What it does instead is assign itself as the request's "content" handler and when the request reaches that point, it performs a call to ap_internal_redirect(). This leads to the creation of a new request object, one that does not contain the headers_out table from the original.

    Assuming that mod_rewrite causes no further redirects, the response is generated from the new request object, which will never have the appropriate (original) headers assigned to it. It is possible to get around this by working in a per-server context (in the main configuration or in a <VirtualHost>), but...

    The Redirect Problem

    Unfortunately, it turns out that it's largely irrelevant anyway, since even if we do use mod_rewrite in a server context, the path the response takes in the event of a redirect still causes the headers that the module set to be tossed out.

    When the request is received by Apache, through a chain of function calls it makes its way to ap_process_request(). This in turn calls ap_process_request_internal(), where the bulk of the important request parsing steps occur (including the invocation of mod_rewrite). It returns an integer status code, which in the case of your redirect happens to be set to 301.

    Most requests return OK (which has a value of 0), leading immediately to ap_finalize_request_protocol(). However, that's not the case here:

    if (access_status == OK) {
        ap_finalize_request_protocol(r);
    }
    else {
        r->status = HTTP_OK;
        ap_die(access_status, r);
    }
    

    ap_die() does some additional manipulation (like returning the response code back to 301), and in this particular case ends with a call to ap_send_error_response().

    Luckily, this is finally root of the problem. Though it might seem like it, things are not "assbackwards", and this causes the destruction of the original headers. There's even a comment about it in the source:

    if (!r->assbackwards) {
        apr_table_t *tmp = r->headers_out;
    
        /* For all HTTP/1.x responses for which we generate the message,
         * we need to avoid inheriting the "normal status" header fields
         * that may have been set by the request handler before the
         * error or redirect, except for Location on external redirects.
         */
        r->headers_out = r->err_headers_out;
        r->err_headers_out = tmp;
        apr_table_clear(r->err_headers_out);
    
        if (ap_is_HTTP_REDIRECT(status) || (status == HTTP_CREATED)) {
            if ((location != NULL) && *location) {
                apr_table_setn(r->headers_out, "Location", location);
            }
            //...
        }
    //...
    }
    

    Take note that r->headers_out is replaced, and the original table is cleared. That table had all of the information that was expected to show up in the response, so now it is lost.

    Conclusion

    If you don't redirect and you define the rules in a per-server context, everything does seem to work correctly. However, this is not what you want. I can see a potential workaround, but I'm not sure if it would be acceptable, not to mention the need to recompile the server.

    As for the Vary: Accept-Encoding, I can only assume it comes from a different module that behaves in a way that allows the header to sneak through. I'm also not sure why Gumbo didn't have an issue when trying it.

    For reference, I was looking at the 2.2.14 and 2.2 trunk source code, and I was modifying and running Apache 2.2.15. There doesn't appear to be any significant differences between the versions in the related code sections.