I'm creating a QA bot with RAG and aiming to provide the specific documents from which the answers are extracted. Retrieval QA uses k documents which are semantically similar to query to generate the answer. The answer need not be in all the k documents, how can we know which documents out of the k documents the answer is extracted from?
How can we know which of those source documents that LLM extracted the answer from?
Adding custom StuffDocumentsChain
helped.
document_prompt = PromptTemplate(input_variables=["page_content", "source"],
template="Context:\ncontent:{page_content}\source:{source}")
doc_chain = StuffDocumentsChain(llm_chain=llm_chain,
document_variable_name="context",
document_prompt=document_prompt,
callbacks=None)
qa_chain = RetrievalQA(combine_documents_chain=doc_chain,
retriever=retriever,
return_source_documents=True,
callbacks=None,
verbose=False)
This post gives good explanation about this approach --> https://nakamasato.medium.com/enhancing-langchains-retrievalqa-for-real-source-links-53713c7d802a