I'm trying to do something like the following using the py2neo module to get information for a large quantity of nodes in a neo4j database that I already know the id's of:
query = f'''
MATCH
(n:MY_LABEL)
OPTIONAL MATCH
(n) -- (u:OTHER_LABEL) // Won't always have a neighbor
WHERE
id(n) in [{','.join(very_long_list_of_nids)}]
RETURN
id(n) as nid,
n.feature1,
u.feature2
'''
resp = graph.run(query)
And I have noticed it's far faster to just omit the WHERE
clause, and do filtering after it returns the content of every n:MY_LABEL
node. Is there a more elegant way to do this?
For reference, the very_long_list_of_nodes
list is about 500k elements long (and I have tried batching it into smaller, 10k chunks and have the same problem) and the database contains 4m nodes, and 10m edges.
You should:
WHERE
clause right under your MATCH
clause. Currently, your WHERE
clause is under the OPTIONAL MATCH
clause, and so the ID filtering is only done after finding the relationships of all MY_LABEL
nodes.:MY_LABEL
qualification from the MATCH
clause. If you already get the node by native ID, checking the label is unnecessary; and you are not using indexing.This should be much faster:
query = f'''
MATCH
(n)
WHERE
ID(n) in $id_list
OPTIONAL MATCH
(n) -- (u:OTHER_LABEL) // Won't always have a neighbor
RETURN
ID(n) as nid,
n.feature1,
u.feature2
'''
resp = graph.run(query, id_list=very_long_list_of_nids)
Also, if the relationships between MY_LABEL
and OTHER_LABEL
always flow in one direction, you should consider using a directional relationship pattern (either -->
or <--
) in your OPTIONAL MATCH
clause, especially if your MY_LABEL
nodes have other kinds of relationships that flow in the opposite direction.