databaseclickhouseolap

When will clickhouse commit the consumed kafka offset in case of writing to distributed tables?


I am puzzled at this scenario, imagine I have two options: (I have M nodes all having the same tables)

kafka_table -> MV_1 -> local_table_1
            -> MV_2 -> local_table_2
            ...
            -> MV_N -> local_table_N

In this case, when an insertion in any of the local_table_<id> fails, the consumer marks this as a failed consume, and tries to reconsume the message at the current offset, and will not commit a new offset.

But in a new scenario:

kafka_table -> MV_1 -> dist_table_1 -> local_table_1
            -> MV_2 -> dist_table_2 -> local_table_2
            ...
            -> MV_N -> dist_table_N -> local_table_N

I don't know what will exactly happen. When will a new kafka offset be commited. Clickhouse by default uses async_insertions for distributed tables, will the new kafka offset be commited when this background insert job is created? or when it is successful, how does clickhouse manages this sync/async mechanism in this case?


Solution

  • Materialized View doesn't know about engine of destination table
    AFAIK engine=Kafka will commit offset when all MVs first level return OK for insert into destination table
    for engine=Distirbuted with default settings it means when data will successfully stored on local temporary .bin files in initiator node