I want to implement a search engine chain using tavily in langchain. This chain gives user's query as an input and returns up to 5 related documents. Each retrieved document must have the content of the document as page_content and the url of the corresponding site as metadata under the definition of LangChain Documents. I must use langchain_core.documents.base.Document class to define documents. So this chain will have two main parts:
I wrote this code but I don't know how to change tavily output format into standard form of document:
from langchain_core.documents.base import Document
from langchain_community.tools.tavily_search import TavilySearchResults
search = TavilySearchResults(max_results=5)
class ParsedDocument(BaseModel):
content: str = Field(description="This refers to the content of the search.")
url: str = Field(description="This refers to the url of the search.")
search_parser = PydanticOutputParser(pydantic_object=ParsedDocument)
search_engine_chain = search | search_parser
I would be grateful if you could help me how to change this code.
I finally found the answer:
class ParsedDocument(BaseModel):
content: str = Field(description="This refers to the content of the search.")
url: str = Field(description="This refers to the url of the search.")
# Define a custom parser
def custom_parser(search_results):
parsed_documents = []
for result in search_results: # Adjust this line based on the actual structure of search_results
parsed_document = ParsedDocument(content=result['content'], url=result['url'])
document = Document(page_content=parsed_document.content, metadata={'url': parsed_document.url})
parsed_documents.append(document)
return parsed_documents
search_engine_chain = search | custom_parser