kubernetesapache-kafkaapache-kafka-connectstrimzi

Exotic choice for Kafka-Connect Deployment


I currently own a Kafka-Connect cluster consisting of two workers deployed across 2 EC2 instances. I create connectors via a client layer relying on the kafka-connect API - i also version them that way, aka I store the configs in json files and deploy them by using them in API requests.

Currently I am trying to solve two problems:

I am thinking of migrating to Kubernetes, however, migrating the servers to pods or even adding an autoscaler will not solve much. I would like to think about an architecture where a request for a connector means the deployment of a dedicated, isolated Kafka Connect worker with a different group id. That way, connectors do not share JVM resources.

I didn't find such architecture proposed elsewhere and tbh, I have my doubts especially that this goes against the definition of kafka connect concepts themselves.

Deploying Kafka Connect connectors in Kubernetes

is this supposed to make it an SW question...


Solution

  • The only real concept of Kafka Connect is that of workers and tasks; there's no recommended deployment architecture.

    Connectors will always share JVM resources of the cluster, unless you form a model where each connector is a dedicated connect "cluster". This can easily be accomplished with containers, such as ECS/Fargate or Kubernetes (EKS). EC2 is too rigid without an ASG, but doesn't provide resource isolation, as you're asking for.

    Strimzi can do that, but scaling is a unique problem that you cannot scale beyond topic partition count for sink consumers and source consumers sometimes cannot scale much at all (Debezium or JDBC source should always have one task per table)

    The Kafka Connect CRD in Strimzi allows you to skip the JSON files, as it'll deploy and manage the connector tasks on its own