I've been experimenting with Apache Kafka, the distributed streaming platform, but I'm having difficulties with the "distributed" aspect of it.
I'm using the example here, which works fine when everything is on the same machine. But I wanna run it as a cluster with 2 or more VMs
What I managed to do so far:
Setting up Zookeeper cluster (Quorum mode as pointed out by Rajkumar Natarajan) by adding the following to /etc/zookeeper/conf/zoo.fcg
:
server.1=192.168.56.101:2888:3888
server.2=192.168.56.102:2888:3888
and making sure myid
from /var/lib/zookeeper
is unique for each server. Running bin/zkServer.sh status
gives one Mode: leader
and Mode: follower
for the rest as it should.
Setting up Kafka cluster by changing the following in config/server.properties
:
broker.id=0 # 1 for the second server
zookeeper.connect=192.168.56.101:2181,192.168.56.102:2181
Setting up a sonsumer in Python:
from kafka import KafkaConsumer
consumer = KafkaConsumer(
topic,
bootstrap_servers=['192.168.56.101:9092','192.168.56.102:9092'])
Setting up a producer in Python:
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers='192.168.56.101:9092,192.168.56.102:9092')
What I want to do:
Configure my Kafka in a way that allows me to run 2 or more brokers on different VMs as a cluster.
My setup:
It took me some time to find the solution since most tutorials stop short of the clustering part or showcase it on one single machine instead of several ones:
All that needs to be done is adding this line to config/server.properties
:
listeners=PLAINTEXT://192.168.56.101:9092 # for broker.id=0
listeners=PLAINTEXT://192.168.56.102:9092 # for broker.id=1