apache-kafkaaiven

Are 3k kafka topics decrease performance?


I have a Kafka Cluster (Using Aivan on AWS):

Kafka Hardware

Startup-2 (2 CPU, 2 GB RAM, 90 GB storage, no backups) 3-node high availability set

Backgroup

I have a topic such that:

Architecture

My team built an architecture such that there will be a group of consumers. They will parse this data, perform some transformations (without any filtering!!) and then sends the final messages back to the kafka to topic=<entity-id>.

It means I upload the data back to the kafka to a topic that contains only a data of a specific entity.

Questions

At any given time, there can be up to 3-4k topics in kafka (1 topic for each unique entity).

  1. Can my kafka handle it well? If not, what do I need to change?
  2. Do I need to delete a topic or it's fine to have (alot of!!) unused topics over time?
  3. Each consumer which consumes the final messages, will consume 100 topics at the same time. I know kafka clients can consume multiple topics concurrenctly but I'm not sure what is the best practices for that.
  4. Please share your concerns.

Requirements


Solution

  • The number of topics is not so important in itself, but each Kafka topic is partitioned and the total number of partitions could impact performance.

    The general recommendation from the Apache Kafka community is to have no more than 4,000 partitions per broker (this includes replicas). The linked KIP article explains some of the possible issues you may face if the limit is breached, and with 3,000 topics it would be easy to do so unless you choose a low partition count and/or replication factor for each topic.

    Choosing a low partition count for a topic is sometimes not a good idea, because it limits the parallelism of reads and writes, leading to performance bottlenecks for your clients.

    Choosing a low replication factor for a topic is also sometimes not a good idea, because it increases the chance of data loss upon failure.

    Generally it's fine to have unused topics on the cluster but be aware that there is still a performance impact for the cluster to manage the metadata for all these partitions and some operations will still take longer than if the topics were not there at all.

    There is also a per-cluster limit but that is much higher (200,000 partitions). So your architecture might be better served simply by increasing the node count of your cluster.