ActiveMQ Artemis with STOMP: Same message from a queue is consumed by two consumers

ActiveMQ Artemis version 2.27

Consumers spin up as pods automatically by KEDA when it sees unread messages in the activemq broker. One consumer process one message, send ACK and disconnect. More frequently than always, I am seeing the same message is read by 2 or more consumers even though ACK is sent. What settings should I enable on the broker side so that every consumer gets a different message?

ActiveMQ is deployed as a pod in Kubernetes. The connection_id is always the same. Here is my consumer side code to connect to the broker:

def connect_subscribe(conn, connection_id, user_id, user_pw, queue_name):
    conn.start()
    conn.connect(user_id, user_pw, wait=True)
    conn.subscribe(destination=queue_name, id=connection_id, ack='client-individual', 
    headers = {'subscription-type': 'ANYCAST','consumer-window-size': 0})
    if conn.is_connected():
        logger.info('Connected to broker.')

This method is called after STOMP listener is initialized. And, this is how a message is consumed. Sample code:

    def process_message(self, headers, body):
        res_dict = json.loads(body) 
        job_id = res_dict['job_id']
        vm_uuid = res_dict['vm_uuid']
        logger.info('Processing the request with message body: {}'.format(body))
        logger.info('To Be implemented!') 
        sleep_in_seconds = os.getenv('SLEEP_IN_SECONDS', default=1)
        time.sleep(int(sleep_in_seconds))
        self.send_ack_and_disconnect(headers['ack'])

Here is the broker.xml:

<?xml version='1.0'?>

<configuration xmlns="urn:activemq"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xmlns:xi="http://www.w3.org/2001/XInclude"
               xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">

   <core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="urn:activemq:core ">

      <name>ROAD</name>


      <persistence-enabled>true</persistence-enabled>

      <!-- this could be ASYNCIO, MAPPED, NIO
           ASYNCIO: Linux Libaio
           MAPPED: mmap files
           NIO: Plain Java Files
       -->
      <journal-type>ASYNCIO</journal-type>

      <paging-directory>data/paging</paging-directory>

      <bindings-directory>data/bindings</bindings-directory>

      <journal-directory>data/journal</journal-directory>

      <large-messages-directory>data/large-messages</large-messages-directory>

      <journal-datasync>true</journal-datasync>

      <journal-min-files>2</journal-min-files>

      <journal-pool-files>10</journal-pool-files>

      <journal-device-block-size>4096</journal-device-block-size>

      <journal-file-size>10M</journal-file-size>
      
      <!--
       This value was determined through a calculation.
       Your system could perform 35.71 writes per millisecond
       on the current journal configuration.
       That translates as a sync write every 28000 nanoseconds.

       Note: If you specify 0 the system will perform writes directly to the disk.
             We recommend this to be 0 if you are using journalType=MAPPED and journal-datasync=false.
      -->
      <journal-buffer-timeout>28000</journal-buffer-timeout>


      <!--
        When using ASYNCIO, this will determine the writing queue depth for libaio.
       -->
      <journal-max-io>4096</journal-max-io>
      <!--
        You can verify the network health of a particular NIC by specifying the <network-check-NIC> element.
         <network-check-NIC>theNicName</network-check-NIC>
        -->

      <!--
        Use this to use an HTTP server to validate the network
         <network-check-URL-list>http://www.apache.org</network-check-URL-list> -->

      <!-- <network-check-period>10000</network-check-period> -->
      <!-- <network-check-timeout>1000</network-check-timeout> -->

      <!-- this is a comma separated list, no spaces, just DNS or IPs
           it should accept IPV6

           Warning: Make sure you understand your network topology as this is meant to validate if your network is valid.
                    Using IPs that could eventually disappear or be partially visible may defeat the purpose.
                    You can use a list of multiple IPs, and if any successful ping will make the server OK to continue running -->
      <!-- <network-check-list>10.0.0.1</network-check-list> -->

      <!-- use this to customize the ping used for ipv4 addresses -->
      <!-- <network-check-ping-command>ping -c 1 -t %d %s</network-check-ping-command> -->

      <!-- use this to customize the ping used for ipv6 addresses -->
      <!-- <network-check-ping6-command>ping6 -c 1 %2$s</network-check-ping6-command> -->
      <!-- how often we are looking for how many bytes are being used on the disk in ms -->
      <disk-scan-period>5000</disk-scan-period>

      <!-- once the disk hits this limit the system will block, or close the connection in certain protocols
           that won't support flow control. -->
      <max-disk-usage>90</max-disk-usage>

      <!-- should the broker detect dead locks and other issues -->
      <critical-analyzer>true</critical-analyzer>

      <critical-analyzer-timeout>120000</critical-analyzer-timeout>

      <critical-analyzer-check-period>60000</critical-analyzer-check-period>

      <critical-analyzer-policy>HALT</critical-analyzer-policy>

      
      <page-sync-timeout>80000</page-sync-timeout>


            <!-- the system will enter into page mode once you hit this limit.
           This is an estimate in bytes of how much the messages are using in memory

            The system will use half of the available memory (-Xmx) by default for the global-max-size.
            You may specify a different value here if you need to customize it to your needs.

            <global-max-size>100Mb</global-max-size>

      -->

      <acceptors>

         <!-- useEpoll means: it will use Netty epoll if you are on a system (Linux) that supports it -->
         <!-- amqpCredits: The number of credits sent to AMQP producers -->
         <!-- amqpLowCredits: The server will send the # credits specified at amqpCredits at this low mark -->
         <!-- amqpDuplicateDetection: If you are not using duplicate detection, set this to false
                                      as duplicate detection requires applicationProperties to be parsed on the server. -->
         <!-- amqpMinLargeMessageSize: Determines how many bytes are considered large, so we start using files to hold their data.
                                       default: 102400, -1 would mean to disable large mesasge control -->

         <!-- Note: If an acceptor needs to be compatible with HornetQ and/or Artemis 1.x clients add
                    "anycastPrefix=jms.queue.;multicastPrefix=jms.topic." to the acceptor url.
                    See https://issues.apache.org/jira/browse/ARTEMIS-1644 for more information. -->
         
         <!-- STOMP Acceptor. -->
     <acceptor name="stomp">tcp://0.0.0.0:61613?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true;sslEnabled=true;keyStorePath=/certs/broker.ks;keyStorePassword=ENC(xxxx);trustStorePath=/certs2/broker.ts;trustStorePassword=ENC(xxxx);needClientAuth=true;enabledProtocols=TLSv1.2</acceptor>
      </acceptors>


      <security-settings>
         <security-setting match="#">
            <permission type="createNonDurableQueue" roles="amq"/>
            <permission type="deleteNonDurableQueue" roles="amq"/>
            <permission type="createDurableQueue" roles="amq"/>
            <permission type="deleteDurableQueue" roles="amq"/>
            <permission type="createAddress" roles="amq"/>
            <permission type="deleteAddress" roles="amq"/>
            <permission type="consume" roles="amq"/>
            <permission type="browse" roles="amq"/>
            <permission type="send" roles="amq"/>
            <!-- we need this otherwise ./artemis data imp wouldn't work -->
            <permission type="manage" roles="amq"/>
         </security-setting>
      </security-settings>

      <address-settings>
         <!-- if you define auto-create on certain queues, management has to be auto-create -->
         <address-setting match="activemq.management#">
            <dead-letter-address>DLQ</dead-letter-address>
            <expiry-address>ExpiryQueue</expiry-address>
            <redelivery-delay>0</redelivery-delay>
            <!-- with -1 only the global-max-size is in use for limiting -->
            <max-size-bytes>-1</max-size-bytes>
            <message-counter-history-day-limit>10</message-counter-history-day-limit>
            <address-full-policy>PAGE</address-full-policy>
            <auto-create-queues>true</auto-create-queues>
            <auto-create-addresses>true</auto-create-addresses>
            <auto-create-jms-queues>true</auto-create-jms-queues>
            <auto-create-jms-topics>true</auto-create-jms-topics>
         </address-setting>
         <!--default for catch all-->
         <address-setting match="#">
            <dead-letter-address>DLQ</dead-letter-address>
            <expiry-address>ExpiryQueue</expiry-address>
            <redelivery-delay>0</redelivery-delay>
            <!-- with -1 only the global-max-size is in use for limiting -->
            <max-size-bytes>-1</max-size-bytes>
            <message-counter-history-day-limit>10</message-counter-history-day-limit>
            <address-full-policy>PAGE</address-full-policy>
            <auto-create-queues>true</auto-create-queues>
            <auto-create-addresses>true</auto-create-addresses>
            <auto-create-jms-queues>true</auto-create-jms-queues>
            <auto-create-jms-topics>true</auto-create-jms-topics>
         </address-setting>
      </address-settings>

      <addresses>
         <address name="DLQ">
            <anycast>
               <queue name="DLQ" />
            </anycast>
         </address>
         <address name="ExpiryQueue">
            <anycast>
               <queue name="ExpiryQueue" />
            </anycast>
         </address>
         <address name="model_supervised">
            <anycast>
               <queue name="model_supervised" />
            </anycast>
         </address>
      </addresses>
   </core>
</configuration>

Application Logs:

Consumer 1:

2025-02-07 12:16:52,277 DEBUG ActiveMQclient:69 - On Message: {"vm_uuid": "422f6867-f9e5-1682-8c8f-0173b81a5b93", "job_id": "Job-pRHNIWrFpBuuGPdh19CQVwBD"}
2025-02-07 12:16:52,292 INFO model_supervised_consumer:30 - Process data in message body {"vm_uuid": "422f6867-f9e5-1682-8c8f-0173b81a5b93", "job_id": "Job-pRHNIWrFpBuuGPdh19CQVwBD"}
2025-02-07 12:16:52,379 DEBUG databasewrapper:98 - Initialized DB Wrapper for - sallydb
2025-02-07 12:16:52,379 DEBUG databasewrapper:487 - Getting status of job: Job-pRHNIWrFpBuuGPdh19CQVwBD
2025-02-07 12:16:52,381 DEBUG databasewrapper:491 - Query responded with result: ('FE_SUPDONE',)
2025-02-07 12:16:52,381 DEBUG databasewrapper:505 - The status of the job Job-pRHNIWrFpBuuGPdh19CQVwBD is FE_SUPDONE.
2025-02-07 12:16:52,381 INFO model_supervised_consumer:369 - Updating status of job: Job-pRHNIWrFpBuuGPdh19CQVwBD to MODEL_SUPRVD_DONE
2025-02-07 12:16:52,405 DEBUG ActiveMQclient:93 - Ack message 4028128
2025-02-07 12:16:52,406 DEBUG ActiveMQclient:35 - Disconnected from broker

Consumer 2 comes up after 6 seconds and gets the same message:

2025-02-07 12:16:57,550 DEBUG ActiveMQclient:69 - On Message: {"vm_uuid": "422f6867-f9e5-1682-8c8f-0173b81a5b93", "job_id": "Job-pRHNIWrFpBuuGPdh19CQVwBD"}
2025-02-07 12:16:57,568 INFO model_supervised_consumer:30 - Process data in message body {"vm_uuid": "422f6867-f9e5-1682-8c8f-0173b81a5b93", "job_id": "Job-pRHNIWrFpBuuGPdh19CQVwBD"}
2025-02-07 12:16:57,656 DEBUG databasewrapper:98 - Initialized DB Wrapper for - sallydb
2025-02-07 12:16:57,656 DEBUG databasewrapper:487 - Getting status of job: Job-pRHNIWrFpBuuGPdh19CQVwBD
2025-02-07 12:16:57,658 DEBUG databasewrapper:491 - Query responded with result: ('MODEL_SUDONE',)
2025-02-07 12:16:57,658 DEBUG databasewrapper:505 - The status of the job Job-pRHNIWrFpBuuGPdh19CQVwBD is MODEL_SUDONE.
2025-02-07 12:16:57,658 DEBUG model_supervised_consumer:40 - The job status is not FE_SUPDONE. Hence do not process the request.
2025-02-07 12:16:57,659 DEBUG ActiveMQclient:97 - NAck message 4028128
2025-02-07 12:16:57,659 DEBUG ActiveMQclient:35 - Disconnected from broker

There are checks within the application code to see if the job is in a desired state to process the message. If not, I am sending a NAck. This is saving me.

I expected that activemq will keep a track of messages sent to which consumers. But, looking at the behavior, it's not the case.

Please advise.

UPDATE: This is the STOMP listener's on_message method that receives the messages. It forks a process to call process_message(). Since Since on_message bails out in a second. Hence to process a long running operation, a python process is spawned. Does this help you?

def on_message(self, headers, body):
    logger.debug(f'Message received: {str(body)}, with ID: {str(headers["ack"])}' )

    # create process
    p = Process(target=self.process_message, args=(headers, body), )
    # start process
    p.start()
    # wait for process to return
    p.join()

Solution

My requirement is to consume only one message per consumer. But, I was seeing the consumer was receiving more than one message. Since the message was not ACKed, another consumer also received the same message as its second message, as it processed the message quickly.

Since I am using stomp.py python module that exposes ConnectionListener class to override its methods. The instance of this class is instantiated once per consumer. The on_message() streams messages as long as the connection is open until disconnect is called after 2 seconds. Seldom, the method that processes the message finishes so fast that it has time to consume another message.

To fix this in a simple way, I used a message counter as an instance variable. If the message count is more than 1, then send a NACK and return.

def on_message(self, headers, body):
    logger.debug(f'Message received: {str(body)}, with id: {str(headers["ack"])}' )
    self._message_received_count += 1

    if self._message_received_count > 1:
        logger.info("The first message has already been received and is being processed. Hence, this message cannot be consumed.")
        self.send_nack(headers["ack"])
        return

    # create process
    p = Process(target=self.process_message, args=(headers, body), )
    # start process
    p.start()
    # wait for process to return
    p.join()