rhttrrcurl

Extract binary attachment using httr


I have searched all over for an answer to this and I am coming up empty. I make a POST request to an API and it returns an object of Content-Type: application/xop+xml.

The response looks like this:

Response [https://...]
  Date: 2023-08-14 18:18
  Status: 200
  Content-Type: multipart/related; type="application/xop+xml"; boundary="uuid:0ca94e9c-b817-4e87-9628-656cdd58be68"; start="<root.message@cxf.apache.org>"; start-info="text/xml"
  Size: 18.3 MB
<BINARY BODY>

The attachment in this case should be a .zip file, but I have no idea how to extract it. I have tried using writeBin to save the content of the response to a file but resulting file is not able to be opened.

I was able to get a bit more information about the content here:

"Content-Type: application/octet-stream"                                                                                                                                                                                                                                                                                                                                                                                                                    
"Content-Transfer-Encoding: binary"                                                                                                                                                                                                                                                                                                                                                                                                                         
"Content-ID: <565bcd99-ad2d-41c4-ac57-16f10557ae60-4229@api.xyz.com>"                                                                                                                                                                                                                                                                                                                                                                                      
"Content-Disposition: attachment;name=\"tmp12345.zip\"" 

I am unable to provide a reproducible example since the API requires authentication, but I am hoping that someone can give me a hint on how I can successfully extract this zip file.


Solution

  • I came up with a solution to this problem, although it is likely not the best. After comments from @MrFlick above, I started looking into searching/parsing vectors of raw data. I came across the function grepRaw which proved to be pretty useful. In my case, the "boundary" the API provided was not all that helpful since even once inside that boundary there were still headers that needed to be handled (perhaps there is a more elegant way). My solution was basically this:

    v = content(response)
    si = grepRaw(".zip", v) + 9
    ei = grepRaw("--uuid", v, all = T)[3] - 1   #uuid appears three times
    bytes = raw[si:ei]
    writeBin(bytes, file_name)
    

    For anyone else searching for an answer, I would suggest you do something similar and though trial and error (along with rawToChar) you can zero in on what parts of the response contain the binary data of interest. Not pretty, but it works.

    If anyone comes up with a more elegant solution, feel free to share it here.