I am developing a web application to be able to answer questions based on the context provided by documents that the user uploads to the application. The problem is that when I use the Mistral v0.2 model, the answers do not finish. They are cut off before finishing. If I use openai, the answers finish correctly. I use this prompt:
template="""
### [INST] Instruccion: Responde en español a las preguntas del usuario según el contexto.
Si no encunetras una respuesta adecuada en el contexto, responde que no tienes información suficiente.
{context}
### question:
{question} (responde en castellano) [/INST]
#"""
template="""
<s>[INST]
"""
prompt = PromptTemplate(
input_variables=['context','question'],
template = template
)
vector = Chroma(client=db,
collection_name="coleccion4",
embedding_function=embeddings)
retriever = vector.as_retriever(search_type="similarity", search_kwargs={"k":3})
llm = HuggingFaceHub(
repo_id="mistralai/Mistral-7B-Instruct-v0.2",
model_kwargs = {"temperature":0.4},
huggingfacehub_api_token = apikey_huggingFace
)
respuesta = rag_chain.invoke(user_question)
when I run the code with openai, I get this response:
But when I use Mistral, the answer does not end:
why does this happen?
I have set max_new_tokens to 2000 and now it seems to work