langchainhuggingfacelarge-language-modelllama-indexgpt4all

ERROR: The prompt size exceeds the context window size and cannot be processed


I have been trying to create a document QA chatbot using GPT4ALL as the llm and hugging face's instructor-large model for embedding, I was able to create the index, but getting the following as a response, it's not really a error which I'm getting as there is no traceback but it's just showing me the following

ERROR: The prompt size exceeds the context window size and cannot be processed.ERROR: The prompt size exceeds the context window size and cannot be processed

This is a follow up question for the following question parent question (this was resolved)

from llama_index import VectorStoreIndex, SimpleDirectoryReader
from InstructorEmbedding import INSTRUCTOR
from llama_index import PromptHelper, ServiceContext
from llama_index import LangchainEmbedding
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import OpenLLM
# from langchain.chat_models.human import HumanInputChatModel
from langchain import PromptTemplate, LLMChain
from langchain.llms import GPT4All
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

documents = SimpleDirectoryReader(r'C:\Users\avish.wagde\Documents\work_avish\LLM_trials\instructor_large').load_data()

print('document loaded in memory.......') 

model_id = 'hkunlp/instructor-large'

model_path = "..\models\GPT4All-13B-snoozy.ggmlv3.q4_0.bin"

callbacks = [StreamingStdOutCallbackHandler()]

# Verbose is required to pass to the callback manager
llm = GPT4All(model = model_path, callbacks=callbacks, verbose=True)

print('llm model ready.............')

embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name = model_id))

print('embedding model ready.............')

# define prompt helper
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 0.2

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

service_context = ServiceContext.from_defaults(chunk_size= 1024, llm=llm, prompt_helper=prompt_helper, embed_model=embed_model)

print('service context set...........')

index = VectorStoreIndex.from_documents(documents, service_context= service_context)

print('indexing done................')

query_engine = index.as_query_engine()

print('query set...........')

response = query_engine.query("What is apple's finnacial situation")
print(response)

here is the screenshot of response i got.. enter image description here

I check over GitHub, many people raised this but I couldn't find anything to resolve this.. The GitHub link for the query


Solution

  • GPT4ALL seems to have a max input size of 2048(?), but you are setting the max size to 4096.

    (Not totally able to confirm this size, based on random comments I found from google: https://github.com/nomic-ai/gpt4all/issues/178)

    You can re-adjust your chunk_size and max_input_size to account for this

    # define prompt helper
    # set maximum input size
    max_input_size = 2048
    # set number of output tokens
    num_output = 256
    # set maximum chunk overlap
    max_chunk_overlap = 0.2
    
    prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
    
    service_context = ServiceContext.from_defaults(chunk_size=512, llm=llm, prompt_helper=prompt_helper, embed_model=embed_model)