pythonhiveudf

Python UDF - import/read external files


I would like to import other python/csv files into my python udf to perform some operations.
Like,
Comparing the table data(which flows in as a stream, row by row) to an external .csv row.
When I try to read data of .csv file, it gives me an error

IOError: File /home/abc/xyz/myfile.csv does not exist

While the code works perfectly well when it is written as a regular python script (not like udf)


Solution

  • If I understood it right . You can try ADD FILE [Your complete file path] or Add FILES [Your directory path].

    Because before referring anything on cluster you must add it to the distribution cache so that code there can access that portion. you can have a look at it. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli