pythonnlpsearch-enginelangchainchain

implement a search engine chain using tavily in langchain


I want to implement a search engine chain using tavily in langchain. This chain gives user's query as an input and returns up to 5 related documents. Each retrieved document must have the content of the document as page_content and the url of the corresponding site as metadata under the definition of LangChain Documents. I must use langchain_core.documents.base.Document class to define documents. So this chain will have two main parts:

  1. Tavily search platform
  2. Parser with the aim of converting search output data into standard LangChai documents.

I wrote this code but I don't know how to change tavily output format into standard form of document:

from langchain_core.documents.base import Document
from langchain_community.tools.tavily_search import TavilySearchResults

search = TavilySearchResults(max_results=5)

class ParsedDocument(BaseModel):
    content: str = Field(description="This refers to the content of the search.")
    url: str = Field(description="This refers to the url of the search.")

search_parser = PydanticOutputParser(pydantic_object=ParsedDocument)
search_engine_chain = search | search_parser

I would be grateful if you could help me how to change this code.


Solution

  • I finally found the answer:

    class ParsedDocument(BaseModel):
        content: str = Field(description="This refers to the content of the search.")
        url: str = Field(description="This refers to the url of the search.")
    
    # Define a custom parser
    def custom_parser(search_results):
        parsed_documents = []
        for result in search_results:  # Adjust this line based on the actual structure of search_results
            parsed_document = ParsedDocument(content=result['content'], url=result['url'])
            document = Document(page_content=parsed_document.content, metadata={'url': parsed_document.url})
            parsed_documents.append(document)
        return parsed_documents
    
    search_engine_chain = search | custom_parser