pythonhttprequestadoberestadobe-pdfservices

Convert a PDF to DOCX using Adobe PDF Services via REST API (with Python)


I am trying to query Adobe PDF services API to generate (export) DOCX from PDF documents.

I just wrote a python code to generate a Bearer Token in order to be identified from Adobe PDF services (see the question here: https://stackoverflow.com/questions/68351955/tunning-a-post-request-to-reach-adobe-pdf-services-using-python-and-a-rest-api). Then I wrote the following piece of code, where I tried to follow the instruction in this page concerning the EXPORT option of Adobe PDF services (here: https://documentcloud.adobe.com/document-services/index.html#post-exportPDF).

Here is the piece of code :

import requests
import json
from requests.structures import CaseInsensitiveDict
N/B: I didn't write the part of the code generating the Token and enabling identification by the server
>> This part is a POST request to upload my PDF file via form parameters
URL = "https://cpf-ue1.adobe.io/ops/:create?respondWith=%257B%2522reltype%2522%253A%2520%2522http%253A%252F%252Fns.adobe.com%252Frel%252Fprimary%2522%257D"

headers = CaseInsensitiveDict()
headers["x-api-key"] = "client_id"
headers["Authorization"] = "Bearer MYREALLYLONGTOKENIGOT"
headers["Content-Type"] = "application/json"

myfile = {"file":open("absolute_path_to_the_pdf_file/input.pdf", "rb")}

j="""
{
  "cpf:engine": {
    "repo:assetId": "urn:aaid:cpf:Service-26c7fda2890b44ad9a82714682e35888"
  },
  "cpf:inputs": {
    "params": {
      "cpf:inline": {
        "targetFormat": "docx"
      }
    },
    "documentIn": {
      "dc:format": "application/pdf",
      "cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/trs_pdf_file_copy.pdf"
    }
  },
  "cpf:outputs": {
    "documentOut": {
      "dc:format": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      "cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/output.docx"
    }
  }
}"""

resp = requests.post(url=URL, headers=headers, json=json.dumps(j), files=myfile)
   

print(resp.text)
print(resp.status_code)

The status of the code is 400 I am tho well authentified by the server But I get the following as a result of print(resp.text) :

{"requestId":"the_request_id","type":"Bad Request","title":"Not a multipart request. Aborting.","status":400,"report":"{\"error_code\":\"INVALID_MULTIPART_REQUEST\"}"}

I think that I have problems understanding the "form parameters" from the Adobe Guide concerning POST method for the EXPORT job of the API (https://documentcloud.adobe.com/document-services/index.html).

Would you have any ideas for improvement. thank you !


Solution

  • Make you variable j as a python dict first then create a JSON string from it. What's also not super clear from Adobe's documentation is the value for documentIn.cpf:location needs to be the same as the key used for you file. I've corrected this to InputFile0 in your script. Also guessing you want to save your file so I've added that too.

    import requests
    import json
    import time
    
    URL = "https://cpf-ue1.adobe.io/ops/:create?respondWith=%257B%2522reltype%2522%253A%2520%2522http%253A%252F%252Fns.adobe.com%252Frel%252Fprimary%2522%257D"
    
    headers = {
        'Authorization': f'Bearer {token}',
        'Accept': 'application/json, text/plain, */*',
        'x-api-key': client_id,
        'Prefer': "respond-async,wait=0",
    }
    
    myfile = {"InputFile0":open("absolute_path_to_the_pdf_file/input.pdf", "rb")}
    
    j={
      "cpf:engine": {
        "repo:assetId": "urn:aaid:cpf:Service-26c7fda2890b44ad9a82714682e35888"
      },
      "cpf:inputs": {
        "params": {
          "cpf:inline": {
            "targetFormat": "docx"
          }
        },
        "documentIn": {
          "dc:format": "application/pdf",
          "cpf:location": "InputFile0"
        }
      },
      "cpf:outputs": {
        "documentOut": {
          "dc:format": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
          "cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/output.docx"
        }
      }
    }
    
    body = {"contentAnalyzerRequests": json.dumps(j)}
    
    resp = requests.post(url=URL, headers=headers, data=body, files=myfile)
       
    
    print(resp.text)
    print(resp.status_code)
    
    poll = True
    while poll:
        new_request = requests.get(resp.headers['location'], headers=headers)
        if new_request.status_code == 200:
            open('test.docx', 'wb').write(new_request.content)
            poll = False
        else:
            time.sleep(5)