I'm using the following mongo-source which is supported by kafka-connect. I found that one of the configurations of the mongo source (from here) is tasks.max.
this means I can provide the connector tasks.max which is > 1, but I fail to understand what it will do behind the scene?
If it will create multiple connectors to listen to mongoDb change stream, then I will end up with duplicate messages. So, does mongo-source really has parallelism and works as a cluster? what does it do if it has more then 1 tasks.max?
Mongo-source doesn't support tasks.max > 1. Even if you set it greater than 1 only one task will be pulling data from mongo to Kafka.
How many task is created depends on particular connector. Function List<Map<String, String>> Connector::taskConfigs(int maxTasks)
, (that should be overridden during the implementation of your connector) return the list, which size determine number of Tasks.
If you check mongo-kafka source connector you will see, that it is singletonList.
Below is a permalink to the current version (1.13.0). You can check the main
branch to see whether it's still a singletonList.
https://github.com/mongodb/mongo-kafka/blob/8c1a9b2bec644477507a898789dded2b3798b2d3/src/main/java/com/mongodb/kafka/connect/MongoSourceConnector.java#L91