I'm using the GenAI API (GPT, Claude) to create a conversational AI that handles multi-turn dialogues. My goal is to maintain the context of the conversation without resending all previous prompts with each new request, as this approach quickly becomes expensive due to token usage.
Currently, I append each new user message and the assistant's responses to a list and send this entire list with each new API call:
conversation_history.append({"role": "user", "content": user_message})
response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=conversation_history)
conversation_history.append({"role": "assistant", "content": response['choices'][0]['message']['content']})
......
Is there a way to maintain the context of a conversation with the API without needing to resend all previous prompts and responses in each request? Ideally, I'm looking for a method to retain the conversational state on the server side or a more efficient way to manage the context.
Unfortunately, you are going to have to resend all the context each time. Requests to openai conform to REST standards, meaning calls are stateless.