apachetomcatmod-proxyajpmod-proxy-ajp

Apache mod_proxy_ajp module prematurely sending traffic to spare backend server


We've got a pair of Apache 2.4 web servers (web02, web03) running mod_proxy_ajp talking to a pair of Tomcat 7.0.59 servers (app02, app03).

The Tomcat server on app03 is a standby server that should not get traffic unless app02 is completely offline.

Apache config on web02 and web03:

<Proxy balancer://ajp_cluster>
  BalancerMember ajp://app02:8009 route=worker1 ping=3 retry=60
  BalancerMember ajp://app03:8009 status=+R route=worker2 ping=3 retry=60
  ProxySet stickysession=JSESSIONID|jsessionid lbmethod=byrequests
</Proxy>

Tomcat config for AJP on app02 and app03:

<Connector protocol="AJP/1.3" URIEncoding="UTF-8" port="8009" />

We are seeing issues where Apache starts sending traffic to app03 which is marked as the spare even when app02 is still available but perhaps a bit busy.

Apache SSL error log:

[Thu Sep 12 14:23:28.028162 2019] [proxy_ajp:error] [pid 24234:tid 140543375898368] (70007)The timeout specified has expired: [client 207.xx.xxx.7:1077] AH00897: cping/cpong failed to 10.160.160.47:8009 (app02)
[Thu Sep 12 14:23:28.028196 2019] [proxy_ajp:error] [pid 24234:tid 140543375898368] [client 207.xx.xxx.7:1077] AH00896: failed to make connection to backend: app02
[Thu Sep 12 14:23:28.098869 2019] [proxy_ajp:error] [pid 24135:tid 140543501776640] [client 207.xx.xxx.7:57809] AH01012: ajp_handle_cping_cpong: ajp_ilink_receive failed, referer: https://site.example.com/cart
[Thu Sep 12 14:23:28.098885 2019] [proxy_ajp:error] [pid 24135:tid 140543501776640] (70007)The timeout specified has expired: [client 207.xx.xxx.7:57809] AH00897: cping/cpong failed to 10.160.160.47:8009 (app02), referer: https://site.example.com/cart

There are hundreds of these messages in our Apache logs.

Any suggestions on settings for making Apache stick to app02 unless it is completely offline?


Solution

  • You are experiencing thread exhaustion in the Tomcat connector causing httpd to think app02 is in a bad state - which, in a way, it is.

    The short answer is switch your Tomcat AJP connector to use protocol="org.apache.coyote.ajp.AjpNioProtocol"

    The long answer is, well, rather longer.

    mod_jk uses persistent connections between httpd and Tomcat. The historical argument for this is performance. It saves the time of establishing a new TCP connection for each request. Generally, testing shows that this argument doesn't hold and that the the time taken to establish a new TCP connection or to perform a CPING/CPONG to confirm that the connection is valid (which you need to do if you use persistent connections) takes near enough the same time. Regaredless, persistent connections are the default with mod_jk.

    When using persistent connections mod_jk creates one connection per httpd worker thread and caches that connection in the worker thread.

    The default AJP connection in Tomcat 7.x is the BIO connector. This connector uses blocking I/O and requires one thread per connection.

    The issue occurs when httpd is configured with more workers than Tomcat has threads. Initially everything is OK. When an httpd worker encounters the first request that needs to be passed to Tomcat, mod_jk creates the persistent connection for that httpd worker and the request is served. Subsequent requests processed by that httpd worker that need to be passed to Tomcat will use that cached connection. Requests are allocated (effectively) randomly to httpd workers. As more httpd workers see their first request that needs to be passed to Tomcat, mod_jk creates the necessary persistent connection for each worker. It is likely that many of the connections to Tomcat will be mostly idle. How idle will depend on the load on httpd and the proportion of those requests that are passed to Tomcat.

    All is well until more httpd workers need to create a connection to Tomcat that Tomcat has threads. Remember that the Tomcat AJP BIO connector requires a thread per connection so maxThreads is essentially the maximum number of AJP connections that Tomcat will allow. At that point mod_jk is unable to create the request and therefore the failover process is initiated.

    There are two solutions. The first - the one I described above - is to remove the one thread per connection limitation. By switching to the NIO AJP connector, Tomcat uses a Poller thread to maintain 1000s of connections, only passing those with data to process to a thread for processing. The limitation for Tomcat processing is then that maxThreads is the maximum number of concurrent requests that Tomcat can process on that Connector.

    The second solution is to disable persistent connections. mod_jk the creates a connection, uses it for a single request and then closes it. This reduces the number of connections the mod_jk requires at any one point between httpd and Tomcat.

    Sorry the above is rather a large wall of text. I've also covered this in various presentations including this one.