hadoopfilenotfoundexceptionwebhdfshttpfs

WebHDFS FileNotFoundException rest api


I am posting this question as a continuation of post webhdfs rest api throwing file not found exception

I have an image file I would like to OPEN through the WebHDFS rest api.

  1. the file exists in hdfs and has appropriate permissions
  2. I can LISTSTATUS that file and get an answer:

curl -i "http://namenode:50070/webhdfs/v1/tmp/file.png?op=LISTSTATUS"

HTTP/1.1 200 OK
Date: Fri, 17 Jul 2020 22:47:29 GMT
Cache-Control: no-cache
Expires: Fri, 17 Jul 2020 22:47:29 GMT
Date: Fri, 17 Jul 2020 22:47:29 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
Content-Type: application/json
Transfer-Encoding: chunked

{"FileStatuses":{"FileStatus":[
{"accessTime":1594828591740,"blockSize":134217728,"childrenNum":0,"fileId":11393739,"group":"hdfs","length":104811,"modificationTime":1594828592000,"owner":"XXXX","pathSuffix":"XXXX","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}
]}}

Content-Type: application/octet-stream
Content-Length: 0
  1. So the api can properly read the metadata, but I cannot get that file to OPEN:

curl -i "http://namenode:50070/webhdfs/v1/tmp/file.png?op=OPEN"

HTTP/1.1 307 Temporary Redirect
Date: Fri, 17 Jul 2020 22:23:17 GMT
Cache-Control: no-cache
Expires: Fri, 17 Jul 2020 22:23:17 GMT
Date: Fri, 17 Jul 2020 22:23:17 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
Location: http://datanode1:50075/webhdfs/v1/tmp/file.png?op=OPEN&namenoderpcaddress=namenode:8020&offset=0
Content-Type: application/octet-stream
Content-Length: 0

{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path is not a file: /tmp/file.png......
  1. So, according to webhdfs rest api throwing file not found exception, I can see that the request is passed off from the namenode to the datanode1. Datanode1 is in my hosts file, I can connect to it an check the status of webhdfs from there:
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
<final>true</final>
</property>

It is allowed, same on the namenode.

  1. I also went to look at the hdfs logs on /var/log/hadoop/hdfs/*.{log,out} to see if I could find errors triggered when I curled, but nothing seems to happen. I see no entry pertaining to my file or webhdfs query. I tried that on the namenode and datanode1.

  2. as a last ditch effort I tried to increase permissions (not ideal) from 644 (seen in point 2/) to 666

hdfs dfs -chmod 666 /tmp/file.png

curl -i "http://namenode:50070/webhdfs/v1/tmp/file.png?op=LISTSTATUS"

HTTP/1.1 403 Forbidden
Date: Fri, 17 Jul 2020 23:06:18 GMT
Cache-Control: no-cache
Expires: Fri, 17 Jul 2020 23:06:18 GMT
Date: Fri, 17 Jul 2020 23:06:18 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
Content-Type: application/json
Transfer-Encoding: chunked

{"RemoteException":{"exception":"AccessControlException","javaClassName":"org.apache.hadoop.security.AccessControlException","message":"Permission denied: user=XXXX, access=READ_EXECUTE, inode=\"/tmp/file.png\":XXXX:hdfs:drw-rw-rw-"}}

So it seems it did the switch but somehow I got a permission issue when relaxing the current permissions that I didnt get before? It is not like I removed the X flag, it wasn't there to begin with. Does access=READ_EXECUTE require both R and X?

Now I am at a loss as to why I can see but not read this file with HDFS. Can someone please help me troubleshoot this?


Solution

  • Looking closer at your last error, ... inode=\"/tmp/file.png\":XXXX:hdfs:drw-rw-rw-"} , it seems to indicate that file.png is actually a directory (leading d symbol) and not a file. This is consistent with the error you're getting in step #3 *..."message":"Path is not a file: /tmp/file.png....

    You can double check that simply by doing $ hdfs dfs -ls /tmp/file.png/.

    Getting back to your access error, you do need an "execute" (x) permission to list the files in a directory.