
how we prevent reading the next offset if the previous offset has not been committed yet in kafka - High Level Consumer

I'm new in kafka. Now, i'm trying to commit kafka message explicitly like if my all mysql operation successful then i'll commit the record but kafka reads next offset if the previous offset has not been committed yet.

Codes are following:

`$conf = new \RdKafka\Conf();
$conf->set('group.id', $groupId);
$conf->set('metadata.broker.list', $brokers);
$conf->set('auto.offset.reset', 'earliest');
$conf->set('enable.auto.commit', 'true');
$conf->set('enable.auto.offset.store', 'false');
$conf->set('auto.commit.interval.ms', 100);
$conf->set('enable.partition.eof', 'true');

$consumer = new \RdKafka\KafkaConsumer($conf);

$this->logger->info("Waiting for partition assignment... (make take some time when quickly re-joining the group after leaving it.)", [$groupId, $brokers, $subscriptionArr]);
//let add some counters so at least we have some numbers of topics we processed...
while ($active) {
    $message = $consumer->consume(120*1000);
    if ($this->debug) {
        $active = false;
    switch ($message->err) {
            $topic_name = $message->topic_name;
            $timestamp = $message->timestamp;
            $payload = $message->payload;
            $message_offset = $message->offset;
            $partition = $message->partition;
            $timeoutMs = 10000000;
            $this->logger->info("topic partition details ", [$partition]);

            $topic = $consumer->newTopic($topic_name);
            $topicPartition = new \RdKafka\TopicPartition($topic_name, $partition);
            $this->logger->info("topic partitions",[$topicPartition]);
            $partition_id = $topicPartition->getPartition();
            $this->logger->info("topic partitions",[$partition_id]);
            $topicPartitionsWithOffsets = $consumer->getCommittedOffsets([$topicPartition], $timeoutMs);
            $queryExecutionStatus = $this->sinkConnector->injectKafka($topic_name, $payload, $timestamp);

            $this->logger->info($topic_name." query execution status",[$queryExecutionStatus]);
            $this->logger->info("before commited offset ", [$topicPartitionsWithOffsets]);
            if ($queryExecutionStatus == 1) {
                if (array_key_exists($topic_name, $search_array) ) {
                    $search_array[$topic_name] = $search_array[$topic_name] + 1;
                } else {
                    // set initial value of the counter
                $this->logger->info("manually commited offset of topic ".$topic_name, [$message_offset]);  

                $topic->offsetStore($message->partition, $message_offset);
                $storeOffset = $topicPartition->getOffset();
                $this->logger->info("commited offset ", [$storeOffset]);
            } else {
                $this->logger->warning("Warning: Message not inserted/updated ".$topic_name, [$message_offset]);
            $this->logger->notice("No more messages on partition; will wait for more", [$search_array]);
            $this->logger->notice("Timed out", [$tmp]);
            $active = false;
            $this->logger->warning("Warning: Kafka-Exception", [message=>$message->errstr(), error=> $message->err, count=> $tmp]);
            //throw new \Exception($message->errstr(), $message->err);
            $active = false;
// flush and close the kafka-client and make sure it does not leave any open files...
$this->logger->notice("Disconnecting from the nodes and go to sleep mode");
//$consumer->commit(); //we dont have a no local offset stored...;) this forcefully lowers: ls /proc/$pid/fd/ | wc -l
$consumer = null;` 

I tried commit() of php rdkafka to commit it synchronously. Also tried to change value of auto.commit.interval.ms configuration


  • Since Kafka has a Pull-based model, meaning consumers have full control over the data consumption and brokers play the role of data storage. Brokers have no knowledge about consumers and how many records were consumed by an arbitrary consumer, let's say all parameters for consuming are provided by a consumer.

    There is a definition of a last consumed offset (current offset) and last committed offset. The values of these offsets are different. The last consumed offset usually is greater than the last committed offset and simply represents the offset value of the latest record that was consumed (for more details you can check a short article: https://www.learningjournal.guru/courses/kafka/kafka-foundation-training/offset-management/). So the reason why your consumer keeps consume records is last consumed offset

    As far as I know, to suspend a record consumption by an offset that is not committed you should "suspend" a consumer itself. In Java, there are pause and resume methods for that purpose, but your question is about PHP, so I did a little pick in Kafka docs for PHP but didn't find the explicit counterpart. Perhaps, you can check rd_kafka_pause_partitions() and rd_kafka_resume_partitions() (see https://idealo.github.io/php-rdkafka-ffi/api/RdKafka/FFI/Library/#rd_kafka_pause_partitions)