goapache-kafkaregional

What's the best way to push kafka messages from my edge nodes?


I have a worker in the primary region (US-East) that computes data on traffic at our edge locations. I want to push the data from an edge region to our primary kafka region.

An example is Poland, Australia, US-West. I want to push all these stats to US-East. I don't want to encurr additional latency during the writes from the edge regions to the primary.

Another option is to create another kafka cluster and worker that acts as a relay. That would require us to maintain individual clusters in each region and would add a lot more complexity to our deployments.

I've seen Mirror Maker, but I don't really want to Mirror anything, I guess I'm looking more for a relay system. If this isn't the designed way to do this, how can I aggregate all of our application metrics to the primary region to be computed and sorted?

Thank you for your time.


Solution

  • As far as I know, here are your options:

    1. Setup a local Kafka cluster in each region and have your edge nodes write to the their local Kafka cluster for low latency writes. From there, you would setup a mirror maker that pulls data from your local Kafka to your remote Kafka for aggregation.
    2. If you're concerned with interrupting your applications request path with high latent blocking requests, then you may want to configure your producers to write asynchronously (non-blocking) to your remote Kafka cluster. Depending on your programming language choice, this could be simple or complex exercise.
    3. Run a per host relay (or data buffer) service that could be as simple as a log file and daemon that pushes to your remote Kafka cluster (as mentioned above). Alternatively, run a single instance Kafka / Zookeeper container (there are docker images that bundle both together) that buffers the data for downstream pulling.

    Option 1. is definitely the most standard solution to this problem, albeit a bit heavy handed. I suspect there will be more tooling coming out Confluent / Kafka folks to support option 3. in the future.