nlpchatbotlangchainlarge-language-modelfalcon

combining falcon 40b instruct with langchain


I want to create a local LLM using falcon 40b instruct model and combine it with lanchain so I can give it a pdf or some resource to learn from so I can query it ask it questions, learn from it and ultimately be able to derive insights from the pdf report from an Excel sheet.

For now, I just want to load a pdf using langchain and have the falcon-40b-instruct model as the agent.

I want to build an llm where I can make it interact with my own data using langchain.

Here is my attempt so far:

from langchain_community.llms import HuggingFaceHub

llm = HuggingFaceHub(
repo_id=model_name,
task="text-generation",
model_kwargs={
"max_new_tokens": 512,
"top_k": 30,
"temperature": 0.1,
"repetition_penalty": 1.03
},
huggingfacehub_api_token="hf_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
)

I reached the following stage:

from langchain_community.chat_models.huggingface import ChatHuggingFace
llm = ChatHuggingFace(llm=llm)

yet I get this error:

HfHubHTTPError: 401 Client Error: Unauthorized for url

I am doing do this to be able to run the following:

qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=vector_db.as_retriever()
)

What am I missing and is there a way to be able to do this fully local like doing the falcon model and pass it to ChatHuggingFace?


Solution

  • The message of HfHubHTTPError: 401 Client Error: Unauthorized for url indicates that you don't have access to endpoint service from HuggingFace > https://huggingface.co/inference-endpoints.

    As you want to run everything locally, here's an example, using HuggingFacePipeline function

    from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
    
    # model_id = "tiiuae/falcon-7b-instruct"  # this model is too large to run on Nvidia 4090 with 16G ram
    model_id = "gpt2"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id)
    pipeline = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=200,
    )
    
    
    from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
    llm = HuggingFacePipeline(pipeline=pipeline)
    
    
    from langchain.prompts import PromptTemplate
    template = """Question: {question}
    
    Answer: Let's think step by step."""
    prompt = PromptTemplate.from_template(template)
    
    chain = prompt | llm
    question = "Tell me about Italy"
    
    print(chain.invoke({"question": question}))