I am doing a 'friend of a friend'-type MATCH in Neo4j. The only thing about it that is throwing me off is an attempt to constrain all the relationships by date.
The basic graph is like so:
The 'real' version of the graph has 10 1st degree friends, and 3385 2nd degree friends.
And although I didn't put a date property on each edge on the image above, indeed that is the case. There is no order to any of the dates.
The general idea is pretty straightforward: I would like make sure to disregard any relationship that has a date property which is older than a predetermined maximum date.
The slightly tricky part is, that if the maximum date constraint is violated within the first relationship (like on one the three edges that come off the node in the image above), then that edge is broken in half and no other traversal down that path should be able to occur. (e.g., I don't want any of the leaf nodes).
I wrote this:
MATCH
(n)-[f1:FRIEND]-()-[f2:FRIEND]-(m)
WITH n, m,split('1962-1-1', '-') AS maxdate
WHERE n.person_id='180'
AND(
(
toInt(maxdate[0]) > toInt(split(f1.date, '-')[0])
)
OR
(
toInt(maxdate[0]) = toInt(split(f1.date, '-')[0])
AND
toInt(maxdate[1]) >= toInt(split(f1.date, '-')[1])
))
AND(
(
toInt(maxdate[0]) > toInt(split(f2.date, '-')[0])
)
OR
(
toInt(maxdate[0]) = toInt(split(f2.date, '-')[0])
AND
toInt(maxdate[1]) >= toInt(split(f2.date, '-')[1])
))
RETURN m;
This code block ran for about 20 minutes and eventually appears to have resulted in something close to what I was aiming for. Here's what it looks like in the browser:
(It has 350 nodes)
First, I'll admit that this is clearly some terribly written code (both in terms of aesthetics and performance). Second, I'm noticing the unassociated nodes on the perimeter.
What I think happened, is when the first degree relationship's date condition fails, but the second degree relationship's doesn't, and so I'm ending up with the 'friend of a friend' that I don't want to be included.
How can I modify my date condition such that I eliminate these free-standing nodes when the first degree edge is invalidated?
If anyone has any insight I would greatly appreciate it. (Not to get too mushy, but thanks to the SO community, I've already come relatively far pretty quickly, and for that I'm grateful.)
It should not run for that long in the first place, the main reason is that you don't have a label + index on your nodes.
addd those first:
CREATE CONSTRAINT ON (p:Person) ASSERT p.person_id is unique;
MATCH (n) where exists(n.person_id) SET n:Person;
if you have your time in yyyy-mm-dd
anyway (I hope so at least) you can compare them directly: (with 2 digits aka 01
) i.e. '2012-01-10' > '2011-08-31'
)
WITH '1962-01-01' AS maxdate
MATCH (n:Person)-[f1:FRIEND]-()-[f2:FRIEND]-(m:Person)
WHERE n.person_id='180' AND f1.date < maxdate and f2.date < maxdate
RETURN m;
you can also use the short-form: (n:Person {person_id:'180'})
if you want to have a general expression on relationships in a path, use a variable rels
(which is then a collection) within your variable-length-path pattern:
WITH '1962-01-01' AS maxdate
MATCH (n:Person {person_id:'180'})-[rels:FRIEND*2]-(m:Person)
WHERE ALL(r in rels WHERE r.date < maxdate)
RETURN m;
you can also use rels(path)
WITH '1962-01-01' AS maxdate
MATCH path = (n:Person {person_id:'180'})-[:FRIEND*2]-(m:Person)
WHERE ALL(r in rels(path) WHERE r.date < maxdate)
RETURN m;
or if the relationships of a path are in relation to each other:
WITH '1962-01-01' AS maxdate
MATCH (n:Person {person_id:'180'})-[rels:FRIEND*2]-(m:Person)
WHERE ALL(idx in range(0, size(rels)-2) WHERE (rels[idx]).date < maxdate AND (rels[idx]).date < (rels[idx+1]).date)
RETURN m;