apache-kafkakafka-consumer-api

Kafka Best Way to Filter Messages


One application (producer) is publishing messages and these messages are being consumed by another application (with multiple consumers). Producer sends data with field country and we will have multiple consumers in our application, each consumer will subscribe to specific country.

From what I have been reading so far, we can have 2 approaches to filter message:

  1. Filter data on consumer side: Producer can add country in message header. Consumer will receive all data and filter country it needs by checking from message header. Not sure if we can/should have multiple Consumers with different filters on different countries? Or just one Consumer that filters out the list of countries and we do aggregation by countries on our own?
  2. One topic with separate partition for separate country: We will have a custom partitioner on Producer so it can send message to a specific partition. Consumers will be directed to the right partition for consuming country specific message.

My question is should we choose option 1 or 2? We are expecting to receive hundreds of messages every few seconds.


Solution

  • In my experience typically the first approach is used.

    The second option is problematic. What if you add a new country? You will need to add a partition to the topic, which is possible but not straightforward. You will also need to change the logic on the producer and conusmer side. If consumers are just subscribed to the topic, then in case of failure partitions will be automatically assigned to the alive consumers inside the consumer group. In your case you will need to handle the failures with the programming logic.

    Another approach is to have a topic per country.

    One more approach is to publish all the data into one topic and then distribute data to other topics(each per consumer) with Kafka Streams application. If the requirements change then you change the implementation of Kafka Streams app.