pythonmachine-learningpdfopenai-apichat-gpt-4

How can I process a pdf using OpenAI's APIs (GPTs)?


The web interface for ChatGPT has an easy pdf upload. Is there an API from openAI that can receive pdfs?

I know there are 3rd party libraries that can read pdf but given there are images and other important information in a pdf, it might be better if a model like GPT 4 Turbo was fed the actual pdf directly.

I'll state my use case to add more context. I intent to do RAG. In the code below I handle the PDF and a prompt. Normally I'd append the text at the end of the prompt. I could still do that with a pdf if I extract its contents manually.

The following code is taken from here https://platform.openai.com/docs/assistants/tools/code-interpreter. Is this how I'm supposed to do it?

# Upload a file with an "assistants" purpose
file = client.files.create(
  file=open("example.pdf", "rb"),
  purpose='assistants'
)

# Create an assistant using the file ID
assistant = client.beta.assistants.create(
  instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
  model="gpt-4-1106-preview",
  tools=[{"type": "code_interpreter"}],
  file_ids=[file.id]
)

There is an upload endpoint as well, but it seems the intent of those endpoints are for fine-tuning and assistants. I think the RAG use case is a normal one and not necessarily related to assistants.


Solution

  • May 2025 edit: according to the official guide, using OpenAI GPT-4.1 allows to extract content of (or answer questions on) an input pdf file foobar.pdf stored locally, with a solution along the lines of

    from openai import OpenAI
    import os
    
    filename = "foobar.pdf"
    prompt = "Extract the content from the file provided without altering it. Just output its exact content and nothing else."
    
    client = OpenAI(api_key=os.environ.get("MY_OPENAI_KEY"))
    
    file = client.files.create(
        file=open(filename, "rb"),
        purpose="user_data"
    )
    
    response = client.responses.create(
        model="gpt-4.1",
        input=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "input_file",
                        "file_id": file.id,
                    },
                    {
                        "type": "input_text",
                        "text": prompt,
                    },
                ]
            }
        ]
    )
    

    The prompt can of course be replaced with the desired user request and I assume that the openai key is stored in a env var named MY_OPENAI_KEY.

    P.S. I have edited the answer as this approach is much more streamlined w.r.t to the assistants-based 2024 solution that you can see in the edit history, heavily inspired by https://medium.com/@erik-kokalj/effectively-analyze-pdfs-with-gpt-4o-api-378bd0f6be03