phpocrcloud-document-ai

Can I get an already recognized document through Google OCR or do I have to request recognition again?


Here https://cloud.google.com/document-ai/docs/process-documents-client-libraries describes how to send a document for recognition and receive various data on it.

But the question is: can I get a document that I have already sent for recognition by some unique id with a GET request? Or do I have to send the same document for recognition every time to get the data from it and pay for it?

If so, I'll have to store a lot of data in my database((

I have already searched here https://cloud.google.com/document-ai/docs/process-documents-client-libraries and watched this course https://codelabs.developers.google.com/codelabs/docai-ocr-python#3. And I tried to see what kind of queries they have and what is possible to get out of them. Except for the processors and account information --- nothing else. I even asked gpt)


Solution

  • You will have to re-send the document every time to re-process the document. The Document AI API is stateless by design, so it doesn't store any document data or user information.

    Although, I'm not entirely sure I understand the use case. Are you referring to sending the same document to multiple processors to extract different information?

    Otherwise, you can store the data extracted by Document AI in Cloud Storage, a database or data warehouse to save this information.

    The Document AI Toolbox SDK has some built in functions to export extracted Entities and FormFields to BigQuery.