I'm using Spring-Data along with SpringBoot to populate my Neo4j graph db.
I've the following Neo4j entities defined:
Source
entity -->
@NodeEntity
public class Source implements Comparable<Source> {
@GraphId private Long id;
private String name;
private SourceType type;
private String dataStoreName;
private String dataStoreDesc;
private Source() {
// Empty constructor required as of Neo4j API 2.0.5
};
public Source(String name, SourceType type, String dataStoreName, String dataStoreDesc) {
this.name = name;
this.type = type;
this.dataStoreName = dataStoreName;
this.dataStoreDesc = dataStoreDesc;
}
@Relationship(type = "CONTAINS", direction = Relationship.UNDIRECTED)
public Set<Field> fields;
public void contains(Field field) {
if (fields == null) {
fields = new HashSet<Field>();
}
fields.add(field);
}
/* Getter and Setters */
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public SourceType getType() {
return type;
}
public void setType(SourceType type) {
this.type = type;
}
public String getDataStoreName() {
return dataStoreName;
}
public void setDataStoreName(String dataStoreName) {
this.dataStoreName = dataStoreName;
}
public String getDataStoreDesc() {
return dataStoreDesc;
}
public void setDataStoreDesc(String dataStoreDesc) {
this.dataStoreDesc = dataStoreDesc;
}
public Set<Field> getFields() {
return fields;
}
public void setFields(Set<Field> fields) {
this.fields = fields;
}
@Override
public int compareTo(Source other) {
String name = other.getName();
SourceType type = other.getType();
if(this.name.equalsIgnoreCase(name) && this.type.equals(type))
return 0;
return -1;
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((name == null) ? 0 : name.hashCode());
result = prime * result + ((type == null) ? 0 : type.hashCode());
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Source other = (Source) obj;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
if (type != other.type)
return false;
return true;
}
}
Field
entity-->
@NodeEntity
public class Field implements Comparable<Field> {
@GraphId private Long id;
private String name;
private FieldType fieldType;
private SourceType sourceType;
private String logicalName;
private String dataType;
private String dataSize;
private String description;
private Field() {
// Empty constructor required as of Neo4j API 2.0.5
};
public Field(String name, FieldType fieldType, SourceType sourceType, String logicalName, String dataType, String dataSize, String description) {
this.name = name;
this.fieldType = fieldType;
this.sourceType = sourceType;
this.logicalName = logicalName;
this.dataType = dataType;
this.dataSize = dataSize;
this.description = description;
}
@Relationship(type = "MAPS-TO", direction = Relationship.UNDIRECTED)
public Set<Field> fields;
public void mapsTo(Field field) {
if (fields == null) {
fields = new HashSet<Field>();
}
fields.add(field);
}
/* Getter and Setters */
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public FieldType getFieldType() {
return fieldType;
}
public void setFieldType(FieldType fieldType) {
this.fieldType = fieldType;
}
public SourceType getSourceType() {
return sourceType;
}
public void setSourceType(SourceType sourceType) {
this.sourceType = sourceType;
}
public String getLogicalName() {
return logicalName;
}
public void setLogicalName(String logicalName) {
this.logicalName = logicalName;
}
public String getDataType() {
return dataType;
}
public void setDataType(String dataType) {
this.dataType = dataType;
}
public String getDataSize() {
return dataSize;
}
public void setDataSize(String dataSize) {
this.dataSize = dataSize;
}
public String getDescription() {
return description;
}
public void setDescription(String description) {
this.description = description;
}
public Set<Field> getFields() {
return fields;
}
public void setFields(Set<Field> fields) {
this.fields = fields;
}
@Override
public int compareTo(Field other) {
String name = other.getName();
FieldType fieldType = other.getFieldType();
SourceType sourceType = other.getSourceType();
if(this.name.equalsIgnoreCase(name) && this.fieldType.equals(fieldType) && this.sourceType.equals(sourceType))
return 0;
return -1;
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((fieldType == null) ? 0 : fieldType.hashCode());
result = prime * result + ((name == null) ? 0 : name.hashCode());
result = prime * result + ((sourceType == null) ? 0 : sourceType.hashCode());
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Field other = (Field) obj;
if (fieldType != other.fieldType)
return false;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
if (sourceType != other.sourceType)
return false;
return true;
}
}
So, a Source
CONTAINS
multiple Field
s. And a Field
is MAPS-TO
one or more other Field
s.
Each Source
is of a SourceType
.
My different SourceType
s are : PRODUCER, INBOUND, STAGING, INTERMEDIATE, OUTBOUND, CONSUMER.
public enum SourceType {
PRODUCER, INBOUND, STAGING, INTERMEDIATE, OUTBOUND, CONSUMER;
}
Each Field
is of a FieldType
.
My different FieldType
s are : FILE_FIELD, DB_COLUMN.
public enum FieldType {
FILE_FIELD, DB_COLUMN;
}
My data lineage is as below: PRODUCER --> INBOUND --> STAGING --> INTERMEDIATE --> OUTBOUND --> CONSUMER
I'm now looking for an advanced Cypher query by which if I provide a Field
in CONSUMER Source
, I'm able to track its lineage back until the PRODUCER Source
.
Similarly, I'm also looking for a query by which if I provide a Field
in PRODUCER Source
, I'm able to track its lineage forward until CONSUMER Source
.
I've tried to build up the query using shortestPath
and neighbors
functions but it doesn't seem to pull up the results I'm looking for.
Any suggestions/pointers will be appreciated.
Thanks in advance !
UPDATE-1
Background on my data lineage:
My application gets file from an external application (PRODUCE).
I'm aware of what database tables/columns of external application populated the fields in file.
So here, PRODUCER will be my Source
node; each table.column of external application (that populated the file) is a Field
node and PRODUCER Source
node will have CONTAINS
relationship with all the Field
nodes (representing table.column of external application database table that populated the file).
The file from external application is called INBOUND. It is a comma delimited file.
I'm aware of what are the field names coming in the file and in what order.
So here, INBOUND will be my Source
node; each field in file will be a Field
node and INBOUND Source
node will have CONTAINS
relationship with all the Field
nodes (representing the file fields in the inbound file).
Also each of the Field
nodes of INBOUND Source
will have a MAPS_TO
relationship with a Field
node of PRODUCER Source
(one-to-one mapping).
Going on similar workflow, my next stage is called STAGING wherein I load the inbound file fields into my database table/column.
So here, STAGING will be my Source
node and each column of the database table (into which I load the file fields) will represent a Field
node.
STAGING Source node will have CONTAINS relationship with all the Field
nodes (that represent the db table.column of db table into which I load file fields).
Also each of the Field
nodes of STAGING Source
will have a MAPS_TO
relationship with a Field
node of INBOUND Source
(one-to-one mapping).
Similar my next stage is INTERMEDIATE. In this stage, I'm querying the table in which I loaded the fields of input file and then flushing the output into another file (based on my business use case, I may chose to query all or only a subset of table columns that were populated from input file).
I'm aware of what fields and in what order will go into my INTERMEDIATE file.
So here, INTERMEDIATE is my Source
node and each field that goes into INTERMEDIATE file represents my Field
node. Also INTERMEDIATE Source
will have CONTAINS
relationship with all the Field
nodes that represent a field in the intermediate file.
Also each of these Field
nodes will have MAPS_TO
relationship with the fields of STAGING Source (one-to-one mapping).
Similarly I've OUTBOUND stage and lastly the CONSUMER stage.
...(I hope you are now able to visualize the lineage)
The objective of my query is, say, if I give a Field
name (that represent a table.column of PRODUCER) as input, then I should be able to track down its lineage till CONSUMER (i.e., the last stage of my lineage).
I was able to get the desired data lineage through below query:
MATCH (f5:Field)-[:MAPS_TO]-(f4:Field)-[:MAPS_TO]-(f3:Field)-[:MAPS_TO]-(f2:Field)-[:MAPS_TO]-(f1:Field)-[:MAPS_TO]-(f:Field)<-[:CONTAINS]-(s:Source {type: "SOURCE"}) WHERE f.name="<my input source field>" RETURN f,s,f1,f2,f3,f4,f5