airflowsensorswebhdfs

webhdfs sensor-Airflow


I want to use sensors to check the arrival of files in hdfs. I used hdfs sensor but I was not able to install snakebite as it required python2 and I'm running on python3. As an alternative I am using webhdfs sensor. While I'm trying to implement I'm getting below error

ERROR - b'<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>\n<title>Error 401 Authentication required</title>\n</head>\n<body><h2>HTTP ERROR 401</h2>\n<p>Problem accessing /webhdfs/v1/. Reason:\n<pre>    Authentication required</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/>                                                

[2021-07-15 12:01:30,948] {taskinstance.py:1128} ERROR - Read operations failed on the namenodes below:xxxxx

Could you please let me know how to use this sensor as I dont find much information on how to use it. or please suggest any alternative to check for the file arrival in hdfs. Thank you in advance for your reply.

Please find the code below:

source_data_sensor = WebHdfsSensor(
task_id='source_data_sensor',
filepath='filepath',
timeout=120,
webhdfs_conn_id='webhdfs_default',
poke_interval=10,
dag=dag,
env={
'JAVA_HOME': '/usr/bin/java'})

Solution

  • From the official documentation: it needs apache-airflow version >=2.1.0 and snakebite-py3. Try installing snakebite-py3 instead of snakebite, or just use pip install apache-airflow-providers-apache-hdfs on top of Airflow 2.1+