python machine-learning pdf openai-api chat-gpt-4

How can I process a pdf using OpenAI's APIs (GPTs)?

The web interface for ChatGPT has an easy pdf upload. Is there an API from openAI that can receive pdfs?

I know there are 3rd party libraries that can read pdf but given there are images and other important information in a pdf, it might be better if a model like GPT 4 Turbo was fed the actual pdf directly.

I'll state my use case to add more context. I intent to do RAG. In the code below I handle the PDF and a prompt. Normally I'd append the text at the end of the prompt. I could still do that with a pdf if I extract its contents manually.

The following code is taken from here https://platform.openai.com/docs/assistants/tools/code-interpreter. Is this how I'm supposed to do it?

# Upload a file with an "assistants" purpose
file = client.files.create(
  file=open("example.pdf", "rb"),
  purpose='assistants'
)

# Create an assistant using the file ID
assistant = client.beta.assistants.create(
  instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
  model="gpt-4-1106-preview",
  tools=[{"type": "code_interpreter"}],
  file_ids=[file.id]
)

There is an upload endpoint as well, but it seems the intent of those endpoints are for fine-tuning and assistants. I think the RAG use case is a normal one and not necessarily related to assistants.

Solution

May 2025 edit: according to the official guide, using OpenAI GPT-4.1 allows to extract content of (or answer questions on) an input pdf file foobar.pdf stored locally, with a solution along the lines of

from openai import OpenAI
import os

filename = "foobar.pdf"
prompt = """Extract the content from the file provided without altering it.
            Just output its exact content and nothing else."""

client = OpenAI(api_key=os.environ.get("MY_OPENAI_KEY"))

file = client.files.create(
    file=open(filename, "rb"),
    purpose="user_data"
)

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_file",
                    "file_id": file.id,
                },
                {
                    "type": "input_text",
                    "text": prompt,
                },
            ]
        }
    ]
)

The prompt can of course be replaced with the desired user request and I assume that the openai key is stored in a env var named MY_OPENAI_KEY.

P.S. I have edited the answer as this approach is much more streamlined w.r.t to the assistants-based 2024 solution that you can see in the edit history, heavily inspired by https://medium.com/@erik-kokalj/effectively-analyze-pdfs-with-gpt-4o-api-378bd0f6be03