pythonartificial-intelligenceopenai-apichatgpt-apiopenai-assistants-api

OpenAI Assistants API: How do I upload a file and use it as a knowledge base?


My goal is to create a chatbot that I can provide a file to that holds a bunch of text, and then use the OpenAI Assistants API to actually use the file when querying my chatbot. I will use the gpt-3.5-turbo model to answer the questions.

The code I have is the following:

file_response = client.files.create(
   file=open("website_content.txt", "rb"),
   purpose="assistants"
)

query_response = client.assistants.query(
   assistant_id="my_assistant_id", 
   input="Tell me about xxx?",
   files=[file_response['id']] 
)

However, this is not working, for what I think could be a few things. For one, I don't fully understand the way it is supposed to work, so I was looking for some guidance. I have already created an assistant via the dashboard, but now I want to just upload a file and then query it. Do I have to use something else, like "threads" via the API, or no?

How do I do this?


Solution

  • Note: The code below works with the OpenAI Assistants API v1. In April 2024, the OpenAI Assistants API v2 was released. See the migration guide.


    I created a customer support chatbot and made a YouTube tutorial about it.

    The process is as follows:

    Step 1: Upload a File with an "assistants" purpose

    my_file = client.files.create(
      file=open("knowledge.txt", "rb"),
      purpose='assistants'
    )
    

    Step 2: Create an Assistant

    my_assistant = client.beta.assistants.create(
        model="gpt-3.5-turbo-1106",
        instructions="You are a customer support chatbot. Use your knowledge base to best respond to customer queries.",
        name="Customer Support Chatbot",
        tools=[{"type": "retrieval"}]
    )
    

    Step 3: Create a Thread

    my_thread = client.beta.threads.create()
    

    Step 4: Add a Message to a Thread

    my_thread_message = client.beta.threads.messages.create(
      thread_id=my_thread.id,
      role="user",
      content="What can I buy in your online store?",
      file_ids=[my_file.id]
    )
    

    Step 5: Run the Assistant

    my_run = client.beta.threads.runs.create(
      thread_id=my_thread.id,
      assistant_id=my_assistant.id,
    )
    

    Step 6: Periodically retrieve the Run to check on its status to see if it has moved to completed

    keep_retrieving_run = client.beta.threads.runs.retrieve(
        thread_id=my_thread.id,
        run_id=my_run.id
    )
    

    Step 7: Retrieve the Messages added by the Assistant to the Thread once the run status is "completed"

    all_messages = client.beta.threads.messages.list(
        thread_id=my_thread.id
    )
    
    print(f"User: {my_thread_message.content[0].text.value}")
    print(f"Assistant: {all_messages.data[0].content[0].text.value}")
    

    See the full code.

    Important note

    The assistant might sometimes behave strangely. The Assistants API is still in beta, and it seems that OpenAI has trouble keeping it realiable, as discussed on the official OpenAI forum.

    The assistant might sometimes answer that it cannot access the files you uploaded. You might think you did something wrong, but if you run identical code later or the next day, the assistant will successfully access all files and give you an answer.

    The weird responses I got were the following: