apache-kafkagreenplum

When greenplum gpss commit to kafka topic


Im trying to use gpss (Greenplum Stream Server) for loading data from Kafka to GreenplumDB.

Main question is how\when gpss instance commit current writen offset to kafka?

Right now gpss instance does not commit any message to kafka but handle current offset in service table in GreenplumDB. Expected behavior for me is:

  1. With using given group.id and topic (that mandatory in kafka and not mandatory in gpss settings - looks strange btw)
  2. Start consuming data from kafka topic
  3. Track higher offset by partition
  4. Wait when COMMIT condition is occur (COMMIT is block of settings in gpss job config)
  5. Write batch of data from kafka to external table using gpfdist
  6. Commit max offset by partition to kafka
  7. repeat

But right now its working without step n.5 Any one know why?

The second question is - does gpss use a group.id? In gpss job config i found PROPERTIES block config that correspond to kafka consumer config properties


Solution

  • gpss would commit consumed offset to Kafka if the 'group.id' is set in the yaml file since version 1.6.0. It only commited offset to the tracking table of Greenplum before.