Below are 3 sample JSON files and graph loader script. The first file contains the most complexity, most of which should be ignored by the loading script. The second file is a simple variation that often occurs. The last file is there to provide a sense of the wide ranging differences that may occur between each file and to show the most immediate example of where the problems are currently.
Before diving in, note that this is just a close approximation of the structure of the data I'm actually working with and it's loading script. There are better ways to handle vertices for people, but this was the first example I could think of.
/*{
"peopleInfo": [
{
"id": {
"idProperty1": "property1Value",
"idProperty2": "someUUID"
}
},
{
"people": [
{
"firstName": "person1FirstName",
"lastName": "person1LastName",
"sequence": 1
},
{
"firstName": "person2FirstName",
"lastName": "person2LastName",
"sequence": 2
},
{ //children and twins may be switched such that twins are sequence 3 & 4 and one or both of them have children with corresponding sequences
"children": [
{
"firstName": "firstChildFirstName",
"lastName": "firstChildLastName",
"sequence": 3
},
{
"firstName": "secondChildFirstName",
"lastName": "secondChildLastName",
"sequence": 4
},
{
"twins": [
{
"firstName": "firstTwinFirstName",
"lastName": "firstTwinLastName",
"sequence": 5
},
{
"firstName": "secondTwinFirstName",
"lastName": "secondTwinLastName",
"sequence": 6
}
]
}
]
}
]
}
]
}*/
The second file doesn't contain any children
/*{
"peopleInfo": [
{
"id": {
"idProperty1": "property1Value",
"idProperty2": "someUUID"
}
},
{
"people": [
{
"firstName": "person1FirstName",
"lastName": "person1LastName",
"sequence": 1
},
{
"firstName": "person2FirstName",
"lastName": "person2LastName",
"sequence": 2
}
]
}
]
}*/
The third file contains Twins, but no single-born children
/*{
"peopleInfo": [
{
"personsID": {
"idProperty1": "property1Value",
"idProperty2": "someUUID"
}
},
{
"people": [
{ // twins can exist without top level people(parents work well to define this) and without other children. Also, children can exist without twins and without parents as well.
"twins": [
{
"firstName": "firstTwinFirstName",
"lastName": "firstTwinLastName",
"sequence": 3
},
{
"firstName": "secondTwinFirstName",
"lastName": "secondTwinLastName",
"sequence": 4
}
]
}
]
}
]
}*/
inputBaseDir = "/path/to/directories"
import java.io.File as javaFile;
def list = []
new javaFile(inputBaseDir).eachDir() { dir ->
list << dir.getAbsolutePath()
}
for (item in list){
def fileBuilder = File.directory(item)
def peopleInfoMapper = fileBuilder.map {
it['idProperty1'] = it.peopleInfo.id.idProperty1[0]
it['idProperty2'] = it.peopleInfo.id.idProperty2[0]
def ppl = it.peopleInfo.people[1]
people = ppl.collect{
if ( it['firstName'] != null){
it['firstName'] = it['firstName']
} else if ( it['lastName'] != null){
it['lastName'] = it['lastName']
} else if ( it['sequence'] != null) {
it['sequence'] = it['sequence']
}
//filling the null values below is the temporary non-solution to get the data to load
if ( it['firstName'] == null){
it['firstName'] = ''
}
if ( it['lastName'] == null){
it['lastName'] = ''
}
if ( it['sequence'] == null){
it['sequence'] = 0
}
it
}
it['people'] = people
it.remove('peopleInfo')
it
}
load(peopleInfoMapper).asVertices {
label "peopleInfo"
key 'idProperty2'
vertexProperty 'people',{
value 'firstName'
value 'lastName'
value 'sequence'
ignore 'children'
ignore 'twins'
}
}
Looking at the third file:
While twins have the allowed values within them, they shouldn't affect loading because ignoring the 'twins' key should ignore all of their meta-property values. In this instance I believe the exception below is being thrown because there weren't any top level people that weren't children or twins and by ignoring the 'twins' key all that's left for the vertexProperty 'people'
is an empty map. My non-answer has simply filled that empty map with an empty string for the names and a zero for the sequences which are loaded into the database along with the actual data.
java.lang.IllegalArgumentException: [On field 'people'] Provided map does not contain property value on field [sequence]: {twin=[{firstName=firstTwinFirstName,lastName=firstTwinLastName, sequence=1},{firstName=secondTwinFirstName,lastName=secondTwinLastName,sequence=2}]}
Looking at the first file: When the 'twins' key is ignored, or directly removed, an empty map is still left as a place holder which is filled by the same non-solution in the loading script and loaded into the database along with the actual data.
I don't know if this is the grooviest solution, but this seems to do the trick
inputBaseDir = "/path/to/directories"
import java.io.File as javaFile;
def list = []
new javaFile(inputBaseDir).eachDir() { dir ->
list << dir.getAbsolutePath()
}
for (item in list){
def fileBuilder = File.directory(item)
def peopleInfoMapper = fileBuilder.map {
it['idProperty1'] = it.peopleInfo.id.idProperty1[0]
it['idProperty2'] = it.peopleInfo.id.idProperty2[0]
def ppl = it.peopleInfo.people[1]
people = ppl.collect{
//removes k:v leaving an empty map
if (it['children'] != null{
it.remove('children')
}
//removes k:v leaving an empty map
if (it['twins'] != null{
it.remove('twins')
}
if ( it['firstName'] != null){
it['firstName'] = it['firstName']
} else if ( it['lastName'] != null){
it['lastName'] = it['lastName']
} else if ( it['sequence'] != null) {
it['sequence'] = it['sequence']
}
}
if (ppl['firstName'][0] != null && ppl['lastName'][0] != null){
it['people'] = people.findAll() //only gathers non-empty maps from people
} else {
/* removing people without desired meta-properties enables
loader to proceed when empty maps from the removal of
children and/or twins are present, while top-level
persons aren't*/
it.remove('people')}
it.remove('peopleInfo')
it
}
load(peopleInfoMapper).asVertices {
label "peopleInfo"
key 'idProperty2'
vertexProperty 'people',{
value 'firstName'
value 'lastName'
value 'sequence'
ignore 'children'
ignore 'twins'
}
}