I'm testing a local GPT with Ollama running on a Flask server. I've developed an interface to chat using Llama3.2 model.
I've managed to create the chat history and the chatbot answers according to the context, but for some reason it gets stuck on the instructions of the first prompt so when I tell it to stop, it doesn't. Let me illustrate:
User: We are gonna play a game in whic I'm gonna say some words and you have to tell me if the word represents an animal. I just want you to tell me if the word I'm saying is an animal or not
system: I'll respond with "yes" or "no" to let you know if the word represents an animal. Go ahead and give your first word!
User: horse
system: It's an animal!
User: elephant
system: Here are my answers: 1. No, "horse" is an animal. 2. Yes, "elephant" is an animal. Let's keep playing! What's the next word?
User: table
system: I'll let you know if each word represents an animal. - horse: Yes, it's an animal. - elephant: Yes, it's an animal. - table: No, it's not an animal. (it's a piece of furniture)
User: bottle
system: Let's get started with the game. Here are my answers: 1. horse - Yes, it's an animal. 2. elephant - Yes, it's an animal. 3. table - No, it's not an animal. 4. bottle - No, it's not an animal. Did I get them right?
User: dolphin
system: Here are my answers: - No (it's a piece of furniture) - Yes - No (it's an inanimate object) - No (it's an inanimate object, a container) - Yes
User: ok stop the game, let's do something else
system: Here are my responses: * horse: yes * elephant: yes * table: no * bottle: no * dolphin: yes
The thing is, when I try the same thing directly on cmd (ollama run llama3.2
) it behaves as expected:
-> We are gonna play a game in whic I'm gonna say some words and you have to tell me if the word represents an animal.I just want you to tell me if the word I'm saying is an animal or not
I'm ready, let's play. What's the first word?
-> horse
Yes.
-> elephant
Yes.
-> table
No.
-> bottle
No.
-> dolphin
Yes.
-> ok stop the game, let's do something else
It was fun playing the animal game with you. What would you like to do next? Would you like to play a different game, have a conversation, or try something else? Let me know!
I don't know what I should change in my code or what is the parameter to control this behavior. My code:
from flask import Flask, render_template, request
from ollama import chat
from langchain_core.messages import AIMessage
from langchain_ollama import ChatOllama
app = Flask(__name__)
MODEL = "llama3.2"
MODEL_MAX_TOKENS = 128000
messages = []
llm = ChatOllama(
model=MODEL,
temperature=1.1,
stream=True
)
@app.route('/ask_gpt', methods=['POST'])
def ask_gpt():
messages.append(("user", request.form['text']))
number_of_tokens = ut.get_number_of_tokens(messages)
# Context accepts a max number of tokens
while(number_of_tokens >= MODEL_MAX_TOKENS):
del(messages[1])
ai_msg = llm.invoke(messages)
return ai_msg.content
Could anyone guide me through this? Thanks in advance.
I found out where the problem was.
I was appending my messages to the history but not the GPT answers so it was unable to understand that questions were already answered.
There's is only one line to add just before the "return" statement:
messages.append({'role': 'assistant', 'content': ai_msg.content})