information-retrievallangchain

Getting the source of information with Langchain


I'm using langchain library to save the information of my company in a Vector Database, and when I query for information the results are great, but need a way to recover where the information are comming too - like source: "www.site.com/about" or at least "document 156". Do any of you know how to do that?

EDIT: Currently, I'm using docsearch.similarity_search(query), what only return me the page_content, but metadata came empty

I'm ingesting with this code, but I'm totally open to change.

db = ElasticVectorSearch.from_documents(
        documents,
        embeddings,
        elasticsearch_url="http://localhost:9200",
        index_name="elastic-index",
    )

Solution

  • You can add metadata to each of those documents by setting document.metadata on each document to a dictionary. The dictionary could be something like {"source": "www.site.com/about"} or {"id": "456"}, to give some examples. Then, pass those documents to from_documents().

    Later, when you get a Document object back from one of the query methods, you can use document.metadata to get the metadata back.