I am using the llama2 quantized model from Huggingface and loading it using ctransformers from langchain. When I run the query, I got the below warning
Number of tokens (512) exceeded maximum context length (512)
Below is my code:
from langchain.llms import CTransformers
llm = CTransformers(model='models_k/llama-2-7b-chat.ggmlv3.q2_K.bin',
model_type='llama',
config={'max_new_tokens': 512,
'temperature': 0.01}
)
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
DEFAULT_SYSTEM_PROMPT="""\
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer to a question, please don't share false information."""
instruction = db_schema + " Based on the database schema provided to you \n Convert the following text from natural language to sql query: \n\n {text} \n only display the sql query"
SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS
template = B_INST + SYSTEM_PROMPT + instruction + E_INST
prompt = PromptTemplate(template=template, input_variables=["text"])
LLM_Chain=LLMChain(prompt=prompt, llm=llm)
print(LLM_Chain.run("List the names and prices of electronic products that cost less than $500."))
Can anyone tell me why am i getting this error? Do I have to change the settings?
You can fix this by the suggestion: context length.
Code like here:
llm = CTransformers(model='models_k/llama-2-7b-chat.ggmlv3.q2_K.bin',
model_type='llama',
config={'max_new_tokens': 600,
'temperature': 0.01,
'context_length': 700}
)