The web interface for ChatGPT has an easy pdf upload. Is there an API from openAI that can receive pdfs?
I know there are 3rd party libraries that can read pdf but given there are images and other important information in a pdf, it might be better if a model like GPT 4 Turbo was fed the actual pdf directly.
I'll state my use case to add more context. I intent to do RAG. In the code below I handle the PDF and a prompt. Normally I'd append the text at the end of the prompt. I could still do that with a pdf if I extract its contents manually.
The following code is taken from here https://platform.openai.com/docs/assistants/tools/code-interpreter. Is this how I'm supposed to do it?
# Upload a file with an "assistants" purpose
file = client.files.create(
file=open("example.pdf", "rb"),
purpose='assistants'
)
# Create an assistant using the file ID
assistant = client.beta.assistants.create(
instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
model="gpt-4-1106-preview",
tools=[{"type": "code_interpreter"}],
file_ids=[file.id]
)
There is an upload endpoint as well, but it seems the intent of those endpoints are for fine-tuning and assistants. I think the RAG use case is a normal one and not necessarily related to assistants.
As of today (openai.__version__==1.42.0
) using OpenAI Assistants + GPT-4o allows to extract content of (or answer questions on) an input pdf file foobar.pdf
stored locally, with a solution along the lines of
from openai import OpenAI
from openai.types.beta.threads.message_create_params import (
Attachment,
AttachmentToolFileSearch,
)
import os
filename = "foobar.pdf"
prompt = "Extract the content from the file provided without altering it. Just output its exact content and nothing else."
client = OpenAI(api_key=os.environ.get("MY_OPENAI_KEY"))
pdf_assistant = client.beta.assistants.create(
model="gpt-4o",
description="An assistant to extract the contents of PDF files.",
tools=[{"type": "file_search"}],
name="PDF assistant",
)
# Create thread
thread = client.beta.threads.create()
file = client.files.create(file=open(filename, "rb"), purpose="assistants")
# Create assistant
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
attachments=[
Attachment(
file_id=file.id, tools=[AttachmentToolFileSearch(type="file_search")]
)
],
content=prompt,
)
# Run thread
run = client.beta.threads.runs.create_and_poll(
thread_id=thread.id, assistant_id=pdf_assistant.id, timeout=1000
)
if run.status != "completed":
raise Exception("Run failed:", run.status)
messages_cursor = client.beta.threads.messages.list(thread_id=thread.id)
messages = [message for message in messages_cursor]
# Output text
res_txt = messages[0].content[0].text.value
print(res_txt)
The prompt
can of course be replaced with the desired user request and I assume that the openai key is stored in a env var named MY_OPENAI_KEY
.
Limitations:
it's not (yet) possible to enforce JSON structure (other than with instructions in the prompt). This solution is inspired by https://medium.com/@erik-kokalj/effectively-analyze-pdfs-with-gpt-4o-api-378bd0f6be03.
this relies on text content in the PDF (i.e. searchable text content), and the queries won't be able to access e.g. image content in the pdf.