apache-kafkacloudapache-nificloudera-cdp

Read/Write with Nifi to Kafka in Cloudera Data Platform CDP public cloud


Nifi and Kafka are now both available in Cloudera Data Platform, CDP public cloud. Nifi is great at talking to everything and Kafka is a mainstream message bus, I just wondered:

What are the minimal steps needed to Produce/Consume data to Kafka from Apache Nifi within CDP Public Cloud

I would Ideally look for steps that work in any cloud, for instance Amazon AWS and Microsoft Azure.

I am satisfied with answers that follow best practices and work with the default configuration of the platform, but if there are common alternatives these are welcome as well.


Solution

  • There will be multiple form factors available in the future, for now I will assume you have an environment that contains 1 datahub with NiFi, and 1 Data Hub with Kafka. (The answer still works if both are on the same datahub).

    Prerequisites

    These steps allow you to Produce data from NiFi to Kafka in CDP Public Cloud

    Unless mentioned otherwise, I have kept everything to its default settings.

    In Kafka Data Hub Cluster:

    1. Gather the FQDN links of the brokers, and the used ports.
    1. Combine the links together in this format: FQDN:port,FQDN:port,FQDN:port it should now look something like this:

    broker1.abc:9093,broker2.abc:9093,broker3.abc:9093

    In NiFi GUI:

    1. Make sure you have some data in NiFi to produce, for example by using the GenerateFlowFile processor
    2. Select the relevant processor for writing to kafka, for example PublishKafka_2_0, configure it as follows:
    1. Connect your GenerateFlowFile processor to your PublishKafka_2_0 processor and start the flow

    These are the minimal steps, a more extensive explanation can be found on in the Cloudera Documentation. Note that it best practice to create topics explicitly (this example leverages the feature of Kafka that automatically lets it create topics when produced to).

    These steps allow you to Consume data with NiFi from Kafka in CDP Public Cloud

    A good check to see if data was written to Kafka, is consuming it again.

    In NiFi GUI:

    1. Create a Kafka consumption processor, for instance ConsumeKafka_2_0, configure its Properties as follows:
    1. Create another processor, or a funnel to send the messages to, and start the consumption processor.

    And that is it, within 30 seconds you should see that the data that you published to Kafka is now flowing into NiFi again.


    Full Disclosure: I am an employee of Cloudera, the driving force behind Nifi.