In my project I am using spring-data-neo4j 4.2.0.M1 with neo4j-ogm 2.0.4. Initially this was using an embedded neo4j-instance, but in the course of investigation for this issue I've migrated to a dedicated neo4j-instance (running on the same machine though) using the Bolt protocol.
I am continously inserting data, basically as it becomes available to my application (so I can't use batch-insert). After startup this works fine and saving an instance of my NodeEntity takes ~60ms which is perfectly fine for my use case. However this slowly degrades over time. After 10-20 minutes, this slows down to about 2s per save, which is not so great anymore. The time seems to peak here and doesn't decrease much more.
Initially I assumed that this was caused by the embedded instance being too small, since I saw repeated messages about GC pauses being reported by neo4j. I've then migrated to a dedicated instance which is much bigger and those GC warnings don't show up anymore. The degradation still occurs though.
Store sizes as reported by neo4j:
Array Store 8.00 KiB
Logical Log 151.36 MiB
Node Store 40.14 MiB
Property Store 1.83 GiB
Relationship Store 742.63 MiB
String Store> Size 120.87 MiB
Total Store Size 4.55 GiB
The instance is configures as follows:
dbms.memory.pagecache.size=5g
dbms.memory.heap.initial_size=4g
dbms.memory.heap.max_size=4g
dbms.jvm.additional=-XX:+UseG1GC
Using YourKit profiler (sampler mode!) I can see that most of the time seems to be spent by neo4j-ogm's EntityGraphMapper, specifically in
org.neo4j.ogm.context.EntityGraphMapper#haveRelationEndsChanged
The NodeEntity that is being saved usually has about ~40 relationships to other nodes, most of them modeled as RelationshipEntity. In an earlier phase I had already noticed that saving the entities was quite slow, as too many related (but unchanged) entities were mapped as well. Since then I am using a depth of 1 when saving. The continuous operations that causes the NodeEntitites to be saved uses a transaction size of 200 entities.
I am not convinced yet, that neo4j-ogm is actually the cause for the slowdown, since I don't see what changes compared to the good initial results. In cases like this I usually suspect memory leaks/pollution, but all the monitoring results for this look good in my application. For the neo4j server instance I don't really know where to look for such information apart from the debug.log.
All in all I've spent quite some time investigating this already and don't know what else to look at. Any thoughts or suggestions? I am happy to provide additional information.
Edit: Follwing @vince's input, I've had another look at memory distribution and found that in fact the Neo4jSession had grown quite a lot after letting the application run for ~3h:
At that time the heap was 1,7 GB big, out of which 70% referenced live data. Out of that, about 300mb were currently referenced (and kept alive) by the Neo4jSession. This may indicate that it has grown too big. How can I manually interfere here?
Entities stick around in the session until they get garbage collected. There may be some performance impact in haveRelationEndsChanged
if you're loading many thousands of entities, so it may be worth doing session.clear()
between each transaction and see if this helps