Will we have any problem if we have millions of partitions for one topic? Due to our business requirement, we are thinking if we can make a partition for every user in kafka. We have millions of users. Any insight would be appreciated!
Yes, I think you will end up having problems if you have millions of partitions for several reasons:
(Most importantly!!) Customers come and go, so you will have the requirement to constantly change the number of partitions or have plenty of unused partitions (because you can not reduce the number of partitions within a topic).
More Partitions Requires More Open File Handles: More Partitions means more directories and segment files on disk.
More Partitions May Increase Unavailability: Planned failures move Leaders off of a Broker one at a time, with minimal downtime per partition. In a hard failure all the leaders are immediately unavailable.
More Partitions May Increase End-to-end Latency: For the message to be seen by a Consumer it must be committed. The Broker replicates data from the leader with a single thread, resulting in overhead per Partition.
More Partitions May Require More Memory In the Client
More details are provided in the blog from Confluent on How to choose the number of topics/partitions in a Kafka cluster?.
In addition, according to Confluent's training material for Kafka developers it is recommended:
"The current limits (2-4K Partitions/Broker, 100s K Partitions per cluster) are maximums. Most environments are well below these values (typically in the 1000-1500 range or less per Broker)."
This blog explains that "Apache Kafka Supports 200K Partitions Per Cluster".
This might change with the replacement of Zookeeper KIP-500 but, again, looking at the first bullet point above this will still be a unhealthy software design.