pythonpdfadobeadobe-pdfservices

Adobe PDF Services: Cryptic (Meaningless?) Error Messages


I am trying to use the Adobe PDF Services to extract text from company report PDFs and am getting the following generic error message for some of them:

    raise SdkException("Request could not be completed. Possible cause attached!", sys.exc_info())
adobe.pdfservices.operation.exception.exceptions.SdkException: description =Request could not be completed. Possible cause attached!, requestTrackingId=(<class 'requests.exceptions.ConnectionError'>, ConnectionError(ProtocolError('Connection aborted.', timeout('The write operation timed out'))), <traceback object at 0x10455d540>)

It is not a network problem because:

  1. I successfully extracted text from 6 other files at the same time
  2. I tried multiple times with each erroneous file

The files are not corrupted because I can view them locally just fine.

The python code I am running is generated by Adobe's Service Account and downloaded to local folder: only the input file name is changed.

I am using a trial account and have only used up 20% of my quota, so it's also not a quota error.

The list of erroneous files are:

  1. https://www.asahigroup-holdings.com/en/ir/pdf/annual/2019_all.pdf
  2. https://www.csx.com/share/wwwcsx15/assets/File/Responsibility/CSX_ESG_Report_Final_7_30.pdf
  3. https://online.flippingbook.com/view/459148139/ (download from there)

Would any one (maybe the Adobe staff) able to provide any insights/advice to the generic error message, so that I can attempt to rectify the problem?


Solution

  • I tested the files in my Node.js environment and they run fine. I'm happy to share the output with you if you want. It looks like the error is due to a timeout which I suspect is because of the file size which actually isn't all that large.

    Try using the sample that allows you to set custom timeouts. The full sample is here but the relevant code is below.

    # Initial setup, create credentials instance.
    credentials = Credentials.service_account_credentials_builder()\
        .from_file(base_path + "/pdfservices-api-credentials.json") \
        .build()
    
    # Create client config instance with custom time-outs.
    client_config = ClientConfig.builder().with_connect_timeout(10000).with_read_timeout(40000).build()
    
    #Create an ExecutionContext using credentials and create a new operation instance.
    execution_context = ExecutionContext.create(credentials, client_config)