artificial-intelligencegoogle-gemini

Access PDF Files using the Gemini API


I am trying to extract information from a pdf using the Gemini API (1.5-pro). Right now it seems like the API can only access Text, Audio Files or Images. Is there any way for it to access PDF files (e.g. via Vertex AI or Google Drive uploads)? The current documentations are quite intransparent.


Solution

  • As the current another approach, how about using the PDF data by converting the images? Gemini 1.5 API can be used for analyzing the images. The flow is as follows.

    1. Convert PDF data to images (PNG and Jpeg).
    2. Upload images to Gemini. Ref
    3. Generate content using the uploaded images. Ref

    In my case, I use this approach for parsing various invoices. Ref I expect that PDF data will be able to be used in future updates.

    Updated on August 14, 2024

    In the current stage, the PDF data can be directly used with Gemini API. Ref

    The PDF data can be used as both inlineData as base64 and file_data as uri of the uploaded data to Gemini.