The web interface for ChatGPT has an easy pdf upload. Is there an API from openAI that can receive pdfs?
I know there are 3rd party libraries that can read pdf but given there are images and other important information in a pdf, it might be better if a model like GPT 4 Turbo was fed the actual pdf directly.
I'll state my use case to add more context. I intent to do RAG. In the code below I handle the PDF and a prompt. Normally I'd append the text at the end of the prompt. I could still do that with a pdf if I extract its contents manually.
The following code is taken from here Is this how I'm supposed to do it?
# Upload a file with an "assistants" purpose
file = client.files.create(
file=open("example.pdf", "rb"),
# Create an assistant using the file ID
assistant = client.beta.assistants.create(
instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
tools=[{"type": "code_interpreter"}],
There is an upload endpoint as well, but it seems the intent of those endpoints are for fine-tuning and assistants. I think the RAG use case is a normal one and not necessarily related to assistants.
As of today (openai.__version__==1.42.0
) using OpenAI Assistants + GPT-4o allows to extract content of (or answer questions on) an input pdf file foobar.pdf
stored locally, with a solution along the lines of
from openai import OpenAI
from openai.types.beta.threads.message_create_params import (
import os
filename = "foobar.pdf"
prompt = "Extract the content from the file provided without altering it. Just output its exact content and nothing else."
client = OpenAI(api_key=os.environ.get("MY_OPENAI_KEY"))
pdf_assistant = client.beta.assistants.create(
description="An assistant to extract the contents of PDF files.",
tools=[{"type": "file_search"}],
name="PDF assistant",
# Create thread
thread = client.beta.threads.create()
file = client.files.create(file=open(filename, "rb"), purpose="assistants")
# Create assistant
Attachment(, tools=[AttachmentToolFileSearch(type="file_search")]
# Run thread
run = client.beta.threads.runs.create_and_poll(,, timeout=1000
if run.status != "completed":
raise Exception("Run failed:", run.status)
messages_cursor = client.beta.threads.messages.list(
messages = [message for message in messages_cursor]
# Output text
res_txt = messages[0].content[0].text.value
The prompt
can of course be replaced with the desired user request and I assume that the openai key is stored in a env var named MY_OPENAI_KEY
it's not (yet) possible to enforce JSON structure (other than with instructions in the prompt). This solution is inspired by
this relies on text content in the PDF (i.e. searchable text content), and the queries won't be able to access e.g. image content in the pdf.