I have an issue with small files and HDFS.
Scenario: I am using NiFi to read messages from the Kafka topic, these are all really small.
Requirement: to store these raw messages of data in HDFS(for replay capability)...before doing further processing on them.
I was thinking using Hadoop Archive (HAR) on them periodically. Is that something i can do through NiFi? the har command seems like a command line thing rather than something that i could execute through Nifi? Would love to know a solution that can achieve my requirement, without bringing down HDFS due to the small files.
Ginil
You can execute command line inside Nifi with ExecuteProcess processor :
You can also take a look at Kafka-connect HDFS for putting kafka records into HDFS.