I am designing an extended family tree using Neo4j. During the design of the relationships I came up with two approaches:
CREATE (p:Person)-[:PARENT_OF]->(s:Person)
CREATE (p:Person)-[:STEPPARENT_OF]->(s:Person)
CREATE (p:Person)-[:MARRIED_TO]->(s:Person)
With this approach I am creating different relationships for every case (keep in mind that there will be a lot of cases = a lot of relationships)
CREATE (p:Person)-[r:PARENT_OF {type:'natural'}]->(s:Person)
CREATE (p:Person)-[r:PARENT_OF {type:'step'}]->(s:Person)
CREATE (p:Person)-[r:SPOUSE_OF {type:'marriage'}]->(s:Person)
With this approach there will be less relationships but the design is a little bit messy.
I would like to know which approach will be better and why?
You are choosing beetwen fine-grained (:PARENT_OF
, :STEPPARENT_OF
, :MARRIED_TO
) or generic relationships (:PARENT_OF {type:'natural'}
, :PARENT_OF {type:'step'}
, :SPOUSE_OF {type:'marriage'}
).
The book Graph Databases (available for download in the Neo4j site) by By Ian Robinson, Jim Webber, and Emil Eifrém says:
Differentiating by relationship name is the best way of eliminating large swathes of the graph from a traversal. Using one or more property values to decide whether or not to follow a relationship incurs extra I/O the first time those properties are accessed because the properties reside in a separate store file from the relationships (after that, however, they’re cached).
Remember that a graph database model should be built focused on the application needs. That is: it depends basically on what type of queries are you asking to your database.
type
of the relationship in your graph transversal queries, probably is a good idea split it into separated relationship types.