The web interface for ChatGPT has an easy pdf upload. Is there an API from openAI that can receive pdfs?
I know there are 3rd party libraries that can read pdf but given there are images and other important information in a pdf, it might be better if a model like GPT 4 Turbo was fed the actual pdf directly.
I'll state my use case to add more context. I intent to do RAG. In the code below I handle the PDF and a prompt. Normally I'd append the text at the end of the prompt. I could still do that with a pdf if I extract its contents manually.
The following code is taken from here https://platform.openai.com/docs/assistants/tools/code-interpreter. Is this how I'm supposed to do it?
# Upload a file with an "assistants" purpose
file = client.files.create(
file=open("example.pdf", "rb"),
purpose='assistants'
)
# Create an assistant using the file ID
assistant = client.beta.assistants.create(
instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
model="gpt-4-1106-preview",
tools=[{"type": "code_interpreter"}],
file_ids=[file.id]
)
There is an upload endpoint as well, but it seems the intent of those endpoints are for fine-tuning and assistants. I think the RAG use case is a normal one and not necessarily related to assistants.
May 2025 edit: according to the official guide, using OpenAI GPT-4.1 allows to extract content of (or answer questions on) an input pdf file foobar.pdf
stored locally, with a solution along the lines of
from openai import OpenAI
import os
filename = "foobar.pdf"
prompt = "Extract the content from the file provided without altering it. Just output its exact content and nothing else."
client = OpenAI(api_key=os.environ.get("MY_OPENAI_KEY"))
file = client.files.create(
file=open(filename, "rb"),
purpose="user_data"
)
response = client.responses.create(
model="gpt-4.1",
input=[
{
"role": "user",
"content": [
{
"type": "input_file",
"file_id": file.id,
},
{
"type": "input_text",
"text": prompt,
},
]
}
]
)
The prompt
can of course be replaced with the desired user request and I assume that the openai key is stored in a env var named MY_OPENAI_KEY
.
P.S. I have edited the answer as this approach is much more streamlined w.r.t to the assistants-based 2024 solution that you can see in the edit history, heavily inspired by https://medium.com/@erik-kokalj/effectively-analyze-pdfs-with-gpt-4o-api-378bd0f6be03