I'm using a HQL query, that contains something similar to...
INSERT OVERWRITE TABLE ex_tb.ex_orc_tb
select *, SUBSTR(INPUT__FILE__NAME,60,4), CONCAT_WS('-', SUBSTR(INPUT__FILE__NAME,71,4), SUBSTR(INPUT__FILE__NAME,75,2), SUBSTR(INPUT__FILE__NAME,77,2))
from ex_db.ex_ext_tb
When I go into hive, and I use that command, it works fine.
When I put it into a pyspark, hivecontext command, instead I get the error...
pyspark.sql.utils.AnalysisException: u"cannot resolve 'INPUT__FILE__NAME' given input columns: [list_name, name, day, link_params, id, template]; line 2 pos 17"
Any ideas why this might be?
INPUT__FILE__NAME
is a Hive specific virtual column and it is not supported in Spark.
Spark provides input_file_name
function which should work in a similar way:
SELECT input_file_name() FROM df
but it requires Spark 2.0 or later to work correctly with PySpark.