apache-kafkaapache-kafka-streamskafka-streams-binder

Prevent key based repartitioning in kafka-streams


I have a slightly strange use case where our applications are not using standard kafka partitioning. Instead we have a custom partitioning strategy, where we use a specific field within a compound key to decide how to partition. This is generally the CustomerId, so that all records for a single customer are contained within a single partition, however the key also contains the other Ids that make the message unique so that compaction still works.

e.g.

topic-1-key

{
  orderId,
  customerId
}

topic-2-key

{
  addressId,
  customerId
}

I want to join these 2 records together, in order to do this with the DSL, my only option is to rekey both records to the customer Id, and do the join. However when I do this, Kafka-streams automatically decides the key-changing operations have occurred, and creates repartition topics for me. Is there any way to override this behaviour whilst using the DSL?

I'm aware I could do this manually using the processor api and state stores, but wondered if there's a way to do it with the DSL, or if its not an option.


Solution

  • It's not possible right now, ie, up to Apache Kafka 3.6.

    There is already WIP to add a new operator markAsPartitioned() to close this gap. KIP-759 is already accepted and will most likely ship with 3.7.0 release.