langchainopenaiembeddingssemantic-search

How to get Retrieval QA to return the exact document that contains the answer from the retrieved top k document?


I'm creating a QA bot with RAG and aiming to provide the specific documents from which the answers are extracted. Retrieval QA uses k documents which are semantically similar to query to generate the answer. The answer need not be in all the k documents, how can we know which documents out of the k documents the answer is extracted from?

How can we know which of those source documents that LLM extracted the answer from?


Solution

  • Adding custom StuffDocumentsChain helped.

    document_prompt = PromptTemplate(input_variables=["page_content", "source"],
                                        template="Context:\ncontent:{page_content}\source:{source}")
    doc_chain = StuffDocumentsChain(llm_chain=llm_chain,
                                            document_variable_name="context",
                                            document_prompt=document_prompt,
                                            callbacks=None)
    qa_chain =  RetrievalQA(combine_documents_chain=doc_chain,
                                    retriever=retriever,
                                    return_source_documents=True,
                                    callbacks=None,
                                    verbose=False)
    

    This post gives good explanation about this approach --> https://nakamasato.medium.com/enhancing-langchains-retrievalqa-for-real-source-links-53713c7d802a