rhadoopcurlhdfswebhdfs

PUT with an empty body using httr (on R) to webHDFS


When trying to put to WebHDFS in order to create a file and write to it (using the following link: https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE) I run into issues using httr.

Using RCurl or RWebHDFS is not possible because the target Hadoop cluster is secure.

Here is the code I have attempted to use:

library(httr)
r <- PUT("https://hadoopmgr1p.global.ad:14000/webhdfs/v1/user/testuser/temp/loadfile_testuser_2019-11-28_15_28_41411?op=CREATE&permission=755&user.name=testuser", 
          authenticate(":", "", type = "gssnegotiate"),
          verbose())

testuser is a super user with permissions to R/W. I get the following error:

<- HTTP/1.1 400 Data upload requests must have content-type set to 'application/octet-stream'
<- Date: Fri, 29 Nov 2019 15:42:30 GMT
<- Date: Fri, 29 Nov 2019 15:42:30 GMT
<- Pragma: no-cache
<- X-Content-Type-Options: nosniff
<- X-XSS-Protection: 1; mode=block
<- Content-Length: 0

The error is pretty explanatory, so I then attempt to PUT with a content-type:

r <- PUT("https://hadoopmgr1p.global.ad:14000/webhdfs/v1/user/testuser/temp/loadfile_testuser_2019-11-28_15_28_41411?op=CREATE&permission=755&user.name=testuser", 
          authenticate(":", "", type = "gssnegotiate"),
          content_type("application/octet-stream"),
          verbose())

I get a success - however it is not truly successful:

<- Date: Fri, 29 Nov 2019 16:04:52 GMT
<- Cache-Control: no-cache
<- Expires: Fri, 29 Nov 2019 16:04:52 GMT
<- Date: Fri, 29 Nov 2019 16:04:52 GMT
<- Pragma: no-cache
<- Content-Type: application/json;charset=utf-8
<- X-Content-Type-Options: nosniff
<- X-XSS-Protection: 1; mode=block
<- Content-Length: 0

There is no file that was uploaded. Uploading a file with that first request, gives me another error:

<- HTTP/1.1 307 Temporary Redirect
<- Date: Fri, 29 Nov 2019 16:07:24 GMT
<- Cache-Control: no-cache
<- Expires: Fri, 29 Nov 2019 16:07:24 GMT
<- Date: Fri, 29 Nov 2019 16:07:24 GMT
<- Pragma: no-cache
<- Content-Type: application/json;charset=utf-8
<- X-Content-Type-Options: nosniff
<- X-XSS-Protection: 1; mode=block
Error in curl::curl_fetch_memory(url, handle = handle) : 
  necessary data rewind wasn't possible

The code in question:

library(httr)
temp_file <- httr::upload_file(lfs_temp_file, type = "text/plain")
r <- PUT("https://hadoopmgr1p.global.ad:14000/webhdfs/v1/user/testuser/temp/loadfile_testuser_2019-11-28_15_28_41411?op=CREATE&permission=755&user.name=testuser", 
          authenticate(":", "", type = "gssnegotiate"),
          body=temp_file,
          content_type("application/octet-stream"),
          verbose())

Attempting the same command using curl works without issue: curl -i -k -X PUT --negotiate -u : "https://hadoopmgr1p.global.ad:14000/webhdfs/v1/user/testuser/temp/loadfile_testuser_2019-11-28_15_28_4141?op=CREATE&permission=755&user.name=testuser"

This results in the following:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0HTTP/1.1 307 Temporary Redirect
Date: Thu, 28 Nov 2019 23:27:16 GMT
Cache-Control: no-cache
Expires: Thu, 28 Nov 2019 23:27:16 GMT
Date: Thu, 28 Nov 2019 23:27:16 GMT
Pragma: no-cache
Content-Type: application/json;charset=utf-8
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
WWW-Authenticate: Negotiate <stuff>/
Set-Cookie: hadoop.auth="<stuff>"; Path=/; Secure; HttpOnly
Location: https://hadoopmgr1p.global.ad:14000/webhdfs/v1/user/testuser/temp/loadfile_testuser_2019-11-28_15_28_4141?op=CREATE&data=true&user.name=testuser&permission=755
Content-Length: 0

Following the Location header lets us create the file successfully.

What am I doing wrong?

Thanks


Solution

  • httr is attempting to follow the redirect, and failing. To fix the issue, tell httr to stop following the location config(followlocation = 0L).

    The PUT command will be as follows:

    r <- PUT("https://hadoopmgr1p.global.ad:14000/webhdfs/v1/user/testuser/temp/
              loadfile_testuser_2019-11-28_15_28_41411?op=CREATE&permission=755&user.name=testuser", 
              authenticate(":", "", type = "gssnegotiate"),
              body=NULL,
              config(followlocation = 0L),
              verbose())
    

    This should return a valid reponse with a Location header.