google-cloud-dataflowapache-beamgoogle-cloud-pubsubazure-eventhubstreaming-analytics

Is it possible to consume from an Azure EventHub topic using Apache Beam / Google Cloud Dataflow?


Problem

We'd like to consume from an EventHub topic in Azure using a dataflow pipeline in Google Cloud.

Question

Is it known whether KafkaIO allows for consuming from EventHubs directly in an Apache Beam/Google Cloud dataflow job? (see this post)

Alternative approaches for getting the EventHub data into Pub/Sub are also appreciated (e.g. Publish from Azure Streaming Analytics to Pub/Sub)

Thank you!


Solution

  • Azure Event Hubs supports the Apache Kafka protocol 1.0 and later, so you should be able to consume events from Event Hubs using KafkaIO (which supports Kafka versions 0.10.1 and newer). Google Cloud has a guide for processing messages from Kafka in Dataflow here.