I am trying to extract information from a pdf using the Gemini API (1.5-pro). Right now it seems like the API can only access Text, Audio Files or Images. Is there any way for it to access PDF files (e.g. via Vertex AI or Google Drive uploads)? The current documentations are quite intransparent.
As the current another approach, how about using the PDF data by converting the images? Gemini 1.5 API can be used for analyzing the images. The flow is as follows.
In my case, I use this approach for parsing various invoices. Ref I expect that PDF data will be able to be used in future updates.
In the current stage, the PDF data can be directly used with Gemini API. Ref
The PDF data can be used as both inlineData
as base64 and file_data
as uri of the uploaded data to Gemini.