I would like to import other python/csv files into my python udf to perform some operations.
Like,
Comparing the table data(which flows in as a stream, row by row) to an external .csv row.
When I try to read data of .csv file, it gives me an error
IOError: File /home/abc/xyz/myfile.csv does not exist
While the code works perfectly well when it is written as a regular python script (not like udf)
If I understood it right . You can try ADD FILE [Your complete file path] or Add FILES [Your directory path].
Because before referring anything on cluster you must add it to the distribution cache so that code there can access that portion. you can have a look at it. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli