pythongoogle-cloud-functionscloud-document-ai

Document AI "400 No valid schema provided for processing" with Cloud Function


I’ve been experiencing an issue with the Google Cloud Document AI API in my Firebase Cloud Function that handles documents uploaded to Google Cloud Storage. The function triggers correctly upon PDF uploads, but I consistently receive the error "400 No valid schema provided for processing" when trying to process documents with my custom Document AI processor. The code itself is based on the processing request documentation for document AI, as well as its sample requests. Community solutions are admittedly sparse, and so far the only things I can find online are people with the same problem.

I’ve tried verifying the processor ID, checked service account (the service account has owner for Document AI, Cloud Storage and Firebase for testing), and tried simpler PDFs - all with no luck.

I’m sure the issue is the request structure, but I’m not sure how to fix it. Any help is appreciated!

MY CODE:

    from google.cloud import documentai_v1beta3 as documentai
    
    def process_pdf(event, context):
        location = 'us' 
        opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
        documentai_client = documentai.DocumentProcessorServiceClient(client_options=opts)
    
        project_id = os.environ.get('PROJECT_ID')
        processor_id = {{MY_PROCESSOR_ID}}
        name = documentai_client.processor_path(project_id, location, processor_id)
    
        content = blob.download_as_bytes()
        raw_document = documentai.RawDocument(content=content, mime_type="application/pdf")
        request = documentai.ProcessRequest(name=name, raw_document=raw_document)
    
        try:
            result = documentai_client.process_document(request=request)
        except Exception as e:
            print(f"Error processing the document: {type(e).__name__} - {str(e)}")
            return```

Solution

  • This specific processor requires a schema either in the request, or to be configured explicitly ahead of time in the UI.

    [Request] Set the process_options.schema_override parameter inside process_options - there is a similar code snippet on their github

    process_options = documentai.ProcessOptions(
      schema_override=documentai.DocumentSchema(
        # ... define your fields to extract here
      )
    )
    request = documentai.ProcessRequest(name=name, raw_document=raw_document, process_options=process_options)
    

    [Explicitly] Update the schema during the 'build' phase of the UI: docs