apache-kafkascheduled-tasksapache-kafka-connectmongodb-kafka-connector

Need to schedule MongoDB kafka connect


We are working with mongodb kafka connetor on top of open source Apache Kafka connector, for data ingestion of json data from Mongo to HDFS. We have kafka consumer which reads data changes in kafka and writes them on hdfs file.

We want to schedule source connectors at specific time different time.

We need to trigger kafka message based on a scheduled date.


Solution

  • We can handle this scenario using source connector’s configuration properties from confluent with customise the polling interval

    link:

    https://www.mongodb.com/docs/kafka-connector/current/source-connector/configuration-properties/all-properties/#std-label-source-configuration-all-properties

    ==> poll.await.time.ms can be a solution

    Otherwise, there is Kafka message scheduler:

    https://github.com/etf1/kafka-message-scheduler

    Automatically Consume Data From Kafka with the Scheduler

    When you create a new scheduler, the vkconfig script takes the following steps:

    Creates a new Vertica schema using the name you specified for the scheduler. You use this name to identify the scheduler during configuration.

    Creates the tables needed to manage the Kafka data load in the newly-created schema.