I use the neo4j java core api and want to update 10 million nodes. I thought it will be better to do it with multithreading but the performance is not that good (35 minutes for setting properties).
To explain: Each node "Person" has at least one relation "POINTSREL" to a "Point" node, which has the property "Points". I want to sum up the points from the "Point" node and set it as property to the "Person" node.
Here is my code:
Transaction transaction = service.beginTx();
ResourceIterator<Node> iterator = service.findNodes(Labels.person);
transaction.success();
transaction.close();
ExecutorService executor = Executors.newFixedThreadPool(5);
while(iterator.hasNext()){
executor.execute(new MyJob(iterator.next()));
}
//wait until all threads are done
executor.shutdown();
try {
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
And here the runnable class
private class MyJob implements Runnable {
private Node node;
/* collect useful parameters in the constructor */
public MyJob(Node node) {
this.node = node;
}
public void run() {
Transaction transaction = service.beginTx();
Iterable<org.neo4j.graphdb.Relationship> rel = this.node.getRelationships(RelationType.POINTSREL, Direction.OUTGOING);
double sum = 0;
for(org.neo4j.graphdb.Relationship entry : rel){
try{
sum += (Double)entry.getEndNode().getProperty("Points");
} catch(Exception e){
e.printStackTrace();
}
}
this.node.setProperty("Sum", sum);
transaction.success();
transaction.close();
}
}
Is there a better (faster) way to do that?
About my setting: AWS Instance with 8 CPUs and 32GB ram
neo4j-wrapper.conf
# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
wrapper.java.initmemory=16000
wrapper.java.maxmemory=16000
neo4j.properties
# The type of cache to use for nodes and relationships.
cache_type=soft
cache.memory_ratio=30.0
neostore.nodestore.db.mapped_memory=2G
neostore.relationshipstore.db.mapped_memory=7G
neostore.propertystore.db.mapped_memory=2G
neostore.propertystore.db.strings.mapped_memory=2G
neostore.propertystore.db.arrays.mapped_memory=512M
I found out that there was, amongst others, a problem with the property "cache_type=soft". I set it to "cache_type=none" and the duration of the execution decreased from 30 minutes to 2 minutes. After some updates there were always threads which were blocked for about 30 seconds - changing this property helps to avoid these blockings. I will search for a more detail explantation.