librdkafka

How to manage threads and memory of librdkafka consumer?


I tried using the librdkafka C++ library. I noticed that 4 new threads are spawned for each new topic my consumer (RdKafka::KafkaConsumer) subscribes to. Approximately 30 MB of virtual memory is also used for every topic subscribed to.

My client application/consumer needs to consume from about 2000 topics. These would then translate to my application using about 8000 threads and 60 GB of virtual memory. Assuming that I need around 20 partitions to achieve my desired throughput, I would need around 20 instances of my application. If all application instances are housed in a single server, then the server would need to be able to run at least 8000 x 20 = 160,000 threads simultaneously and use 60 x 20 = 1.2 TB of virtual memory.

160,000 threads and 1.2 TB of virtual memory is very overwhelming for a single server. So, multiple servers may be used to house the instances to distribute the load. Still, the divided numbers are still quite mind-boggling.

Is there a way to somehow control the amount of threads and memory of the client application when using the librdkafka library?


Solution

  • A single consumer can consume from any number (well, reasonable numbers) of topics/partitions, you shouldn't create a separate consumer for each topic.

    Also see https://github.com/edenhill/librdkafka/wiki/FAQ#number-of-internal-threads