I have a Kafka Cluster (Using Aivan on AWS):
Startup-2 (2 CPU, 2 GB RAM, 90 GB storage, no backups) 3-node high availability set
I have a topic such that:
My team built an architecture such that there will be a group of consumers. They will parse this data, perform some transformations (without any filtering!!) and then sends the final messages back to the kafka to topic=<entity-id>
.
It means I upload the data back to the kafka to a topic that contains only a data of a specific entity.
At any given time, there can be up to 3-4k topics in kafka (1 topic for each unique entity).
The number of topics is not so important in itself, but each Kafka topic is partitioned and the total number of partitions could impact performance.
The general recommendation from the Apache Kafka community is to have no more than 4,000 partitions per broker (this includes replicas). The linked KIP article explains some of the possible issues you may face if the limit is breached, and with 3,000 topics it would be easy to do so unless you choose a low partition count and/or replication factor for each topic.
Choosing a low partition count for a topic is sometimes not a good idea, because it limits the parallelism of reads and writes, leading to performance bottlenecks for your clients.
Choosing a low replication factor for a topic is also sometimes not a good idea, because it increases the chance of data loss upon failure.
Generally it's fine to have unused topics on the cluster but be aware that there is still a performance impact for the cluster to manage the metadata for all these partitions and some operations will still take longer than if the topics were not there at all.
There is also a per-cluster limit but that is much higher (200,000 partitions). So your architecture might be better served simply by increasing the node count of your cluster.