apache-kafkaapache-stormkafka-consumer-apikafka-producer-apiheron

Kafka Integration in Apache Heron


I am trying to integrate Kafka with a Heron Topology. However, I am not able to find any examples with the latest version of Heron (0.17.5). Is there any example that can be shared or any suggestions on how to implement a custom Kafka Spout and Kafka Bolt?

Edit 1:

I believe KafkaSpout and KafkaBolt were intentionally deprecated in Heron to give way for the new Streamlet API. I am currently to see if I can build a KafkaSource and KafkaSink using the Streamlet API. However, I am getting the below exception, when I try to create a KafkaConsumer within the Source.

Caused by: java.io.NotSerializableException: org.apache.kafka.clients.consumer.KafkaConsumer
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at com.twitter.heron.api.utils.Utils.serialize(Utils.java:97)

Edit 2:

Fixed the above issue. I was initializing the KafkaConsumer in the Constructor which was wrong. Initializing the same in the setup() method fixed it.


Solution

  • I managed to get this done using Streamlet API for Heron. I'm posting the same here. Hope it helps others facing the same problem.

    Kafka Source

    public class KafkaSource implements Source {
    
        private String streamName;
    
        private Consumer<String, String> kafkaConsumer;
        private List<String> kafkaTopic;
    
        private static final Logger LOGGER = Logger.getLogger("KafkaSource");
    
        @Override
        public void setup(Context context) {
    
            this.streamName = context.getStreamName();
    
            kafkaTopic = Arrays.asList(KafkaProperties.KAFKA_TOPIC);
    
            Properties props = new Properties();
            props.put("bootstrap.servers", KafkaProperties.BOOTSTRAP_SERVERS);
            props.put("group.id", KafkaProperties.CONSUMER_GROUP_ID);
            props.put("enable.auto.commit", KafkaProperties.ENABLE_AUTO_COMMIT);
            props.put("auto.commit.interval.ms", KafkaProperties.AUTO_COMMIT_INTERVAL_MS);
            props.put("session.timeout.ms", KafkaProperties.SESSION_TIMEOUT);
            props.put("key.deserializer", KafkaProperties.KEY_DESERIALIZER);
            props.put("value.deserializer", KafkaProperties.VALUE_DESERIALIZER);
            props.put("auto.offset.reset", KafkaProperties.AUTO_OFFSET_RESET);
            props.put("max.poll.records", KafkaProperties.MAX_POLL_RECORDS);
            props.put("max.poll.interval.ms", KafkaProperties.MAX_POLL_INTERVAL_MS);
    
            this.kafkaConsumer = new KafkaConsumer<>(props);
    
            kafkaConsumer.subscribe(kafkaTopic);
        }
    
        @Override
        public Collection get() {
    
            List<String> kafkaRecords = new ArrayList<>();
    
            ConsumerRecords<String, String> records = kafkaConsumer.poll(Long.MAX_VALUE);
    
            for (ConsumerRecord<String, String> record : records) {
                String rVal = record.value();
                kafkaRecords.add(rVal);
            }
    
            return kafkaRecords;
        }
    
        @Override
        public void cleanup() {
            kafkaConsumer.wakeup();
        }
    }