CSV data named test_csv.csv from windows. Ingesting CSV data to hdfs. Beats > (ListenBeats) NiFi (PutHDFS) > HDFS
a,b,c,d,e
a1,b1,c1,d1,e1
a2,b2,c2,d2,e2
a3,b3,c3,d3,e3
a4,b4,c4,d4,e4
a5,b5,c5,d5,e5
a6,b6,c6,d6,e6
a7,b7,c7,d7,e7
a8,b8,c8,d8,e8
according to Nifi Flow UI it works fine and successfully written into hdfs. Problem is
hadoop@ambari:~$ hdfs dfs -ls /user/nifi/test
Found 9 items
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:30 /user/nifi/test/0192a8bb-67ec-462e-a602-62a5425afc99
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:30 /user/nifi/test/0211ec05-fc62-4b82-87e5-a2e20a9fb07e
-rw-r--r-- 3 nifi hdfs 481 2020-07-06 14:30 /user/nifi/test/1e227df9-f49f-46d6-a309-25e466fa14cf
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:30 /user/nifi/test/324a0c0e-e190-4239-b594-edbf9fcab0d6
-rw-r--r-- 3 nifi hdfs 474 2020-07-06 14:30 /user/nifi/test/3d34827b-6bae-4c21-981e-9722b7a6703e
-rw-r--r-- 3 nifi hdfs 481 2020-07-06 14:30 /user/nifi/test/6873c51b-a93b-4872-b33c-0e59b85afcd5
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:30 /user/nifi/test/98606d6b-2206-4b2e-8204-8363a87f41d0
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:30 /user/nifi/test/f25e56b5-88d7-4135-b475-213e4e54b47f
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:30 /user/nifi/test/f354f587-8da2-418f-be0d-34e8a79d7d39
i've tried to change PutHDFS directory into /user/nifi/test.csv it returns
hadoop@ambari:~$ hdfs dfs -cat /user/nifi/test.csv
cat: `/user/nifi/test.csv': Is a directory
hadoop@ambari:~$ hdfs dfs -ls /user/nifi/test.csv
Found 9 items
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:35 /user/nifi/test.csv/02cdc89d-3cb9-494a-b7f5-d280d7b7c65e
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:35 /user/nifi/test.csv/2476906a-00d9-463a-89ef-ea885f823faa
-rw-r--r-- 3 nifi hdfs 474 2020-07-06 14:35 /user/nifi/test.csv/5b9a9d7e-0c2f-428c-8af4-e875c6db1a04
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:35 /user/nifi/test.csv/66017da5-b55f-437b-a3cf-0a6b45d86ce8
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:35 /user/nifi/test.csv/7be93660-75a1-416b-b019-656d466813d6
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:35 /user/nifi/test.csv/98877296-126c-4ac9-9da5-cef62937e9f9
-rw-r--r-- 3 nifi hdfs 481 2020-07-06 14:35 /user/nifi/test.csv/ac075d33-1137-4aea-9e5b-fc11097558eb
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:35 /user/nifi/test.csv/b9b44c08-1bc6-4e33-947b-daf265491181
-rw-r--r-- 3 nifi hdfs 481 2020-07-06 14:35 /user/nifi/test.csv/ba6464db-ef64-4993-a070-80f1392eac1e
is it possible to make nifi write to hdfs in a single directory file? i was expecing that it will create test.csv file in hdfs
Thank you
Every flow file in NiFi has an attribute named "filename" and that is what PutHDFS is using as the filename in HDFS. The "Directory" property in PutHDFS is only for the directory, so you want to put only "/user/nifi".
In order to change the filename, you would put an UpdateAttribute processor right before PutHDFS, and set filename = whatever-you-want.csv
If you set it to a static value then every time it writes there is going to be an existing file and be in conflict, either replace or throw an error. So you probably want to use a MergeContent/MergeRecord processor first to batch together many small CSV entries into a larger flow file, and then create a dynamic filename like:
filename = test-${now()}.csv
You can use a different expression, but just something unique like a timestamp, date string, or UUID.