[SOLVED] How to get Retrieval QA to return the exact document that contains the answer from the retrieved top k document?

How to get Retrieval QA to return the exact document that contains the answer from the retrieved top k document?

I'm creating a QA bot with RAG and aiming to provide the specific documents from which the answers are extracted. Retrieval QA uses k documents which are semantically similar to query to generate the answer. The answer need not be in all the k documents, how can we know which documents out of the k documents the answer is extracted from?

How can we know which of those source documents that LLM extracted the answer from?

Solution

Adding custom StuffDocumentsChain helped.

document_prompt = PromptTemplate(input_variables=["page_content", "source"],
                                    template="Context:\ncontent:{page_content}\source:{source}")
doc_chain = StuffDocumentsChain(llm_chain=llm_chain,
                                        document_variable_name="context",
                                        document_prompt=document_prompt,
                                        callbacks=None)
qa_chain =  RetrievalQA(combine_documents_chain=doc_chain,
                                retriever=retriever,
                                return_source_documents=True,
                                callbacks=None,
                                verbose=False)

This post gives good explanation about this approach --> https://nakamasato.medium.com/enhancing-langchains-retrievalqa-for-real-source-links-53713c7d802a