hadoopclouderaflumetweetstreamflume-twitter

What should be flume.conf parametres for save tweets to single FlumeData file per hour?


We are saving tweets in a directory order like /user/flume/2016/06/28/13/FlumeData... .But each hour it creates more than 100 FlumeData file.I have changed TwitterAgent.sinks.HDFS.hdfs.rollSize = 52428800 (50 mb) same thing happened again.After that I tried with changing rollcount parametre too but didnt work.How can i set parametres to get one FlumeData file per hour.


Solution

  • I resolved this problem with setting rollInterval=3600 rollcount=0 and batchSize=100 flume.conf parametres as @vkgade suggest