neo4jopenai-apilangchainlarge-language-model

Vector store created using existing graph for multiple nodes/labels


Am trying to create vector stores on top of my existing KG using from_existing_graph, (followed tomaz and Saurav Joshi neo4j blog posts) - this method is allowing me to create embedding/vector index only for single label due to which am unable to get desired results while asking NLQ (I am assuming though).

below code is able to answer, the age and location of Oliver but not what he directed, i believe this is due to from_existing_graph has only to pass single label and its corresponding properties as option for generating embeddings and vector index Any ideas, how to achieve this?

import os
import re
from langchain.vectorstores.neo4j_vector import Neo4jVector
# from langchain.document_loaders import WikipediaLoader
from langchain_openai import OpenAIEmbeddings
# from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.graphs import Neo4jGraph
import openai
# from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

os.environ["OPENAI_API_KEY"] = "sk-xx"
url = "neo4j+s://xxxx.databases.neo4j.io"
username = "neo4j"
password = "mypassword"
existing_graph = Neo4jVector.from_existing_graph(
    embedding=OpenAIEmbeddings(),
    url=url,
    username=username,
    password=password,
    index_name="person",
    node_label="Person",
    text_node_properties=["name", "age", "location"],
    embedding_node_property="embedding",
)

from langchain.chat_models import ChatOpenAI
from langchain.chains import GraphCypherQAChain
from langchain.graphs import Neo4jGraph

graph = Neo4jGraph(
    url=url, username=username, password=password
)

chain = GraphCypherQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True
)

query = "Where does Oliver Stone live?"
#query = "Name some films directed by Oliver Stone?" 

graph_result = chain.invoke(query)

vector_results = existing_graph.similarity_search(query, k=1)
for i, res in enumerate(vector_results):
    print(res.page_content)
    if i != len(vector_results)-1:
        print()
vector_result = vector_results[0].page_content

# Construct prompt for OpenAI
final_prompt = f"""You are a helpful question-answering agent. Your task is to analyze
and synthesize information from two sources: the top result from a similarity search
(unstructured information) and relevant data from a graph database (structured information).
Given the user's query: {query}, provide a meaningful and efficient answer based
on the insights derived from the following data:

Unstructured information: {vector_result}.
Structured information: {graph_result} """


from openai import OpenAI
client = OpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)

chat_completion = client.chat.completions.create(messages=[{"role": "user","content": final_prompt,  }],model="gpt-3.5-turbo",)

answer = chat_completion.choices[0].message.content.strip()
print(answer)

Any help would be highly appreicated?

here is my schema: Node properties are the following:

Person {name: STRING, embedding: LIST, age: INTEGER, location: STRING},Actor {name: STRING, embedding: LIST},Movie {title: STRING},Director {name: STRING, embedding: LIST, age: INTEGER, location: STRING}
Relationship properties are the following:
ACTED_IN {role: STRING}
The relationships are the following:
(:Person)-[:ACTED_IN]->(:Movie),(:Person)-[:DIRECTED]->(:Movie),(:Actor)-[:ACTED_IN]->(:Movie),(:Director)-[:DIRECTED]->(:Movie)

Cypher used to create:

CREATE (charlie:Person:Actor {name: 'Charlie Sheen'})-[:ACTED_IN {role: 'Bud Fox'}]->(wallStreet:Movie {title: 'Wall Street'})<-[:DIRECTED]-(oliver:Person:Director {name: 'Oliver Stone'});
MATCH (n:Person {name: 'Oliver Stone'}) SET n.age = 30, n.location = "New York" RETURN n

Solution

  • You need to add the relationship :DIRECTED into the index person_index since the movie he directed is not part of the embedding. Once you have the query to add the movies he directed, you will then add it on the resulting node metadata (see retrieval_query). Then on your vector result, you will add the information about the movie tile (as metadata movie[0]["title"]).

    You may need to collect all movie titles if there are more than one movie titles in the graph. I'm sure you can figure it out.

    Reference: https://github.com/tomasonjo/blogs/blob/master/llm/neo4jvector_langchain_deepdive.ipynb

    import os
    from langchain.vectorstores.neo4j_vector import Neo4jVector
    from langchain_openai import OpenAIEmbeddings
    import openai
    
    os.environ["OPENAI_API_KEY"] = "sk-<key>"
    url = "bolt://localhost:7687"
    username = "neo4j"
    password = "awesome_password"
    
    retrieval_query = """
           MATCH (node)-[:DIRECTED]->(m:Movie)
           WITH node, score, collect(m) as movies
           RETURN node.name as text, score, node{.*, embedding: Null, movies: movies} as metadata
           """
    
    existing_index_return = Neo4jVector.from_existing_index(
        embedding=OpenAIEmbeddings(),
        url=url,
        username=username,
        password=password,
        database="neo4j",
        index_name="person_index",
        text_node_property="name",
        retrieval_query=retrieval_query,
    )
    
    from langchain_openai import ChatOpenAI
    from langchain.chains import GraphCypherQAChain
    from langchain_community.graphs import Neo4jGraph
    
    graph = Neo4jGraph(
        url=url, username=username, password=password
    )
    
    chain = GraphCypherQAChain.from_llm(
        ChatOpenAI(temperature=0), graph=graph, verbose=True
    )
    
    #query = "Where does Oliver Stone live?"
    query = "Name some films directed by Oliver Stone?" 
    
    graph_result = chain.invoke(query)
    
    vector_results = existing_index_return.similarity_search(query, k=1)
    vector_result = vector_results[0].page_content + " lives in " + vector_results[0].metadata["location"] + " and he directed the movie " + vector_results[0].metadata["movies"][0]["title"]
    
    # Construct prompt for OpenAI
    final_prompt = f"""You are a helpful question-answering agent. Your task is to analyze
    and synthesize information from two sources: the top result from a similarity search
    (unstructured information) and relevant data from a graph database (structured information).
    Given the user's query: {query}, provide a meaningful and efficient answer based
    on the insights derived from the following data:
    
    Unstructured information: {vector_result}.
    Structured information: {graph_result} """
    
    
    from openai import OpenAI
    client = OpenAI(
        # This is the default and can be omitted
        api_key=os.environ.get("OPENAI_API_KEY"),
    )
    
    chat_completion = client.chat.completions.create(messages=[{"role": "user","content": final_prompt,  }],model="gpt-3.5-turbo",)
    
    answer = chat_completion.choices[0].message.content.strip()
    print(answer)
    

    Sample output:

    > Entering new GraphCypherQAChain chain...
    Generated Cypher:
    MATCH (d:Director {name: "Oliver Stone"})-[:DIRECTED]->(m:Movie)
    RETURN m.title
    Full Context:
    [{'m.title': 'Wall Street'}]
    
    > Finished chain.
    Based on the unstructured information retrieved from the top result of the search, Oliver Stone directed the film "Wall Street." 
    
    In addition to "Wall Street," some other films directed by Oliver Stone include "Platoon," "JFK," "Born on the Fourth of July," "Natural Born Killers," and "Snowden."