I'm trying to do something like the following using the py2neo module to get information for a large quantity of nodes in a neo4j database that I already know the id's of:
query = f'''
MATCH
(n:MY_LABEL)
OPTIONAL MATCH
(n) -- (u:OTHER_LABEL) // Won't always have a neighbor
WHERE
id(n) in [{','.join(very_long_list_of_nids)}]
RETURN
id(n) as nid,
n.feature1,
u.feature2
'''
resp = graph.run(query)
And I have noticed it's far faster to just omit the WHERE clause, and do filtering after it returns the content of every n:MY_LABEL node. Is there a more elegant way to do this?
For reference, the very_long_list_of_nodes list is about 500k elements long (and I have tried batching it into smaller, 10k chunks and have the same problem) and the database contains 4m nodes, and 10m edges.
You should:
WHERE clause right under your MATCH clause. Currently, your WHERE clause is under the OPTIONAL MATCH clause, and so the ID filtering is only done after finding the relationships of all MY_LABEL nodes.:MY_LABEL qualification from the MATCH clause. If you already get the node by native ID, checking the label is unnecessary; and you are not using indexing.This should be much faster:
query = f'''
MATCH
(n)
WHERE
ID(n) in $id_list
OPTIONAL MATCH
(n) -- (u:OTHER_LABEL) // Won't always have a neighbor
RETURN
ID(n) as nid,
n.feature1,
u.feature2
'''
resp = graph.run(query, id_list=very_long_list_of_nids)
Also, if the relationships between MY_LABEL and OTHER_LABEL always flow in one direction, you should consider using a directional relationship pattern (either --> or <--) in your OPTIONAL MATCH clause, especially if your MY_LABEL nodes have other kinds of relationships that flow in the opposite direction.