I am trying to query Adobe PDF services API to generate (export) DOCX from PDF documents.
I just wrote a python code to generate a Bearer Token in order to be identified from Adobe PDF services (see the question here: https://stackoverflow.com/questions/68351955/tunning-a-post-request-to-reach-adobe-pdf-services-using-python-and-a-rest-api). Then I wrote the following piece of code, where I tried to follow the instruction in this page concerning the EXPORT
option of Adobe PDF services (here: https://documentcloud.adobe.com/document-services/index.html#post-exportPDF).
Here is the piece of code :
import requests
import json
from requests.structures import CaseInsensitiveDict
URL = "https://cpf-ue1.adobe.io/ops/:create?respondWith=%257B%2522reltype%2522%253A%2520%2522http%253A%252F%252Fns.adobe.com%252Frel%252Fprimary%2522%257D"
headers = CaseInsensitiveDict()
headers["x-api-key"] = "client_id"
headers["Authorization"] = "Bearer MYREALLYLONGTOKENIGOT"
headers["Content-Type"] = "application/json"
myfile = {"file":open("absolute_path_to_the_pdf_file/input.pdf", "rb")}
j="""
{
"cpf:engine": {
"repo:assetId": "urn:aaid:cpf:Service-26c7fda2890b44ad9a82714682e35888"
},
"cpf:inputs": {
"params": {
"cpf:inline": {
"targetFormat": "docx"
}
},
"documentIn": {
"dc:format": "application/pdf",
"cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/trs_pdf_file_copy.pdf"
}
},
"cpf:outputs": {
"documentOut": {
"dc:format": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/output.docx"
}
}
}"""
resp = requests.post(url=URL, headers=headers, json=json.dumps(j), files=myfile)
print(resp.text)
print(resp.status_code)
The status of the code is 400
I am tho well authentified by the server
But I get the following as a result of print(resp.text)
:
{"requestId":"the_request_id","type":"Bad Request","title":"Not a multipart request. Aborting.","status":400,"report":"{\"error_code\":\"INVALID_MULTIPART_REQUEST\"}"}
I think that I have problems understanding the "form parameters" from the Adobe Guide concerning POST method for the EXPORT job of the API (https://documentcloud.adobe.com/document-services/index.html).
Would you have any ideas for improvement. thank you !
Make you variable j
as a python dict
first then create a JSON string from it.
What's also not super clear from Adobe's documentation is the value for documentIn.cpf:location
needs to be the same as the key used for you file. I've corrected this to InputFile0
in your script. Also guessing you want to save your file so I've added that too.
import requests
import json
import time
URL = "https://cpf-ue1.adobe.io/ops/:create?respondWith=%257B%2522reltype%2522%253A%2520%2522http%253A%252F%252Fns.adobe.com%252Frel%252Fprimary%2522%257D"
headers = {
'Authorization': f'Bearer {token}',
'Accept': 'application/json, text/plain, */*',
'x-api-key': client_id,
'Prefer': "respond-async,wait=0",
}
myfile = {"InputFile0":open("absolute_path_to_the_pdf_file/input.pdf", "rb")}
j={
"cpf:engine": {
"repo:assetId": "urn:aaid:cpf:Service-26c7fda2890b44ad9a82714682e35888"
},
"cpf:inputs": {
"params": {
"cpf:inline": {
"targetFormat": "docx"
}
},
"documentIn": {
"dc:format": "application/pdf",
"cpf:location": "InputFile0"
}
},
"cpf:outputs": {
"documentOut": {
"dc:format": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/output.docx"
}
}
}
body = {"contentAnalyzerRequests": json.dumps(j)}
resp = requests.post(url=URL, headers=headers, data=body, files=myfile)
print(resp.text)
print(resp.status_code)
poll = True
while poll:
new_request = requests.get(resp.headers['location'], headers=headers)
if new_request.status_code == 200:
open('test.docx', 'wb').write(new_request.content)
poll = False
else:
time.sleep(5)