apache-nifihortonworks-dataflow

Apache Nifi - Consume Kafka + Merge Content + Put HDFS to avoid small files


I am having around 2000000 messages in Kafka topic and I want to put these records into HDFS using NiFi,so I am using PutHDFS processor for this along with ConsumeKafka_0_10 but it generates small files in HDFS, So I am using Merge Content processor for the merging the records before pushing the file. enter image description here Please help if the configuration needs changes This works fine for small number of messages but writes a single file for every record when it comes to topics with massive data.

Thank you!!


Solution

  • The Minimum Number of Entries is set to 1 which means it could have anywhere from 1 to the Max Number of Entries. Try making that something higher like 100k.