I have run a session of gremlin server for a tinker graph.
gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
gremlin.tinkergraph.vertexIdManager=LONG
gremlin.tinkergraph.graphLocation=data/db.kryo
gremlin.tinkergraph.graphFormat=gryo
During the session, I have created many vertex and edges. At the moment, when I reached 180k vertex and 350k edges, the server showed a poor performance. It couldn't perform a simple query for instance, :> g.V(999).values('name')
.
Moreover, when I closed the server, it did not successfully write the contents to the graphLocation=data/db.kryo
, as defined above. So I lost all information about 180k vertex and 350k edges created so far.
I am wondering about the capacity of TinkerGraph and gremlin server:
How many edges, vertex, and size of a graph can it handle?
TinkerGraph is only limited by the memory that you give to it. You can control that memory by increasing your -Xmx
JVM setting. If your graph is hosted in Gremlin Server and you have not changed its -Xmx
setting then its not too surprising that you started to see some performance problems there as Gremlin Server has a fairly low initial value by default at 512m
- shown here.
Is there any way to avoid the loss of data while closing server and writing content to a file?
The data loss could have been related to the memory issues you were having. It's hard to say. It is worth noting that the flush to disk that TinkerGraph does on close can run into problems the larger the graph gets, meaning, the larger the graph gets the longer it will take to write the whole thing to disk, the more chance that something will go wrong during that write (i.e. power failure).
Should I consider using a not in-memory graph? For instance, neo4j.
That depends on your situation. If you are one time loading a graph that does not change often and just doing analysis then TinkerGraph is probably the best solution that there is versus any other TinkerPop enabled graph. On the other hand, if you have a transactional workload where the graph is constantly changing (as in the backend to some kind of application) then you will probably want a graph that can flush to disk at the end of each transaction, like Neo4j, JanusGraph, etc.
Regardless of the graph you choose, be sure to allocate an appropriate amount of -Xmx
to Gremlin Server so that it can do its work appropriately.