I want to use sensors to check the arrival of files in hdfs. I used hdfs sensor but I was not able to install snakebite as it required python2 and I'm running on python3. As an alternative I am using webhdfs sensor. While I'm trying to implement I'm getting below error
ERROR - b'<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>\n<title>Error 401 Authentication required</title>\n</head>\n<body><h2>HTTP ERROR 401</h2>\n<p>Problem accessing /webhdfs/v1/. Reason:\n<pre> Authentication required</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/>
[2021-07-15 12:01:30,948] {taskinstance.py:1128} ERROR - Read operations failed on the namenodes below:xxxxx
Could you please let me know how to use this sensor as I dont find much information on how to use it. or please suggest any alternative to check for the file arrival in hdfs. Thank you in advance for your reply.
Please find the code below:
source_data_sensor = WebHdfsSensor(
task_id='source_data_sensor',
filepath='filepath',
timeout=120,
webhdfs_conn_id='webhdfs_default',
poke_interval=10,
dag=dag,
env={
'JAVA_HOME': '/usr/bin/java'})
From the official documentation: it needs apache-airflow
version >=2.1.0 and snakebite-py3
. Try installing snakebite-py3
instead of snakebite
, or just use pip install apache-airflow-providers-apache-hdfs
on top of Airflow 2.1+