I'm occasionally getting the following EJB exception across several different message driven beans:
javax.ejb.EJBException: Failed to acquire the pool semaphore, strictTimeout=10000
This behavior closely corresponds to when a particular database is having issues and thereby increases the amount of time spent in the MDB's onMessage function. The messages are being delivered by an ActiveMQ broker (version 5.4.2). The prefetch on the MDBs is 2000 (20 Sessions x 100 Messages per session).
My question is a general one. What exactly is happening here? I know that a message which has been delivered to the server running the MDB will time out after 10 seconds if there is no instance in the bean pool to handle it, however how has that message been delivered to the server in the first place? My assumption up to this point is that the MDB requests messages from the broker in the quantity of only when it no longer has any messages to process. Are they simply waiting in that server-side "bucket" for too long?
Has anyone else run into this? Suggestions for tuning prefetch/semaphore timeout?
EDIT: Forgot to mention I'm using JBoss AS 5.1.0
After doing some research I've found a satisfactory explanation for this EJBException.
MessageDrivenBeans have an instance pool. When a batch of JMS messages is delivered to an MDB in the quantity of the prefetch each are assigned an instance from this pool and are delivered to that instance via the onMessage
function.
A little about how the pool works: In JBoss 5.1.0 the pooled beans such as MDBs and SessionBeans are configured by default through JBoss AOP, specifically a file in the deploy directory titled "ejb3-interceptors-aop.xml". This file creates interceptor bindings and default annotations for any class matching its domain. In the case of the Message Driven Bean domain, among other things a org.jboss.ejb3.annotation.Pool
annotation:
<annotation expr="class(*) AND !class(@org.jboss.ejb3.annotation.Pool)">
@org.jboss.ejb3.annotation.Pool (value="StrictMaxPool", maxSize=15, timeout=10000)
</annotation>
The parameters of that annotation are described here.
Herein lies the rub. If the message prefetch exceeds the maxSize of this pool (which it usually will for high throughput messaging applications) you will necessarily have messages that are waiting for an MDB instance. If the time from message delivery to calling onMessage exceeds the pool timeout for any message, an EJBException will be thrown. This may not be an issue for the first few iterations of the message distribution, but if you have a large prefetch and long average onMessage time, the message towards the end of the queue will begin to fail.
Some quick algebra reveals that this will occurs, roughly speaking, when
timeout < (prefetch x onMessageTime) / maxSize
This assumes that messages are distributed instantaneously, and each onMessage takes the same time but should give you a rough estimate of whether you're way out of bounds.
The solution to this problem is more subjective. Simply increasing the timeout is a naive option, because it will mask the fact that messages are sitting on your application server instead of your queue. Given that onMessage time is somewhat fixed, decreasing the prefetch is most likely a good option as is increasing the pool size, if resources allow. In tuning this I decreased timeout in addition to decreasing prefetch substantially and increasing maxSize to keep messages on the queue for longer while maintaining my alert indicator for when onMessage times are higher than normal.