I am trying to learn Kafka by taking the classic Twitter streaming example. I am trying to use my producer to stream twitter data based on 2 filters to different partition of same topic. For example, twitter data with tracks='Google' to one partition and track='Apple' to another.
class Producer(StreamListener):
def __init__(self, producer):
self.producer = producer
def on_data(self, data):
self.producer.send(topic_name, value=data)
return True
def on_error(self, error):
print(error)
twitter_stream = Stream(auth, Producer(producer))
twitter_stream.filter(track=["Google"])
How do i add another track and stream that data to another partition.
Likewise, how do i make my consumer consume from a specific partition.
consumer = KafkaConsumer(
topic_name,
bootstrap_servers=['localhost:9092'],
auto_offset_reset='latest',
enable_auto_commit=True,
auto_commit_interval_ms = 5000,
max_poll_records = 100,
value_deserializer=lambda x: json.loads(x.decode('utf-8')))
After some research, I was able to resolve this issue:
In the producer side, specify the partition:
self.producer.send(topic_name, value=data,partition=0)
In the consumer side,
consumer = KafkaConsumer(
bootstrap_servers=['localhost:9092'],
auto_offset_reset='latest',
enable_auto_commit=True,
auto_commit_interval_ms = 5000,
max_poll_records = 100,
value_deserializer=lambda x: json.loads(x.decode('utf-8')))
consumer.assign([TopicPartition('trial', 0)])