graphneo4jtitangiraph

What are the factors to consider while choosing a Graph DB for about 30 TB data


I'm in the process of developing a software system ( Graph Database ) to study the interconnection between multiple components. It could end up with about 30 TB of data. I would like to know what all factors to consider in choosing the right database.

Some of the options i'm looking are Apache Giraph, TitanDB. I'm also wondering if a smaller scale DB like neo4j or OrientDB might itself work


Solution

  • This is a very broad question so I would define exactly what you looking for because size can be a bit vague.

    I think any of the example graph dbs you provided can model data that large.

    A few "more detailed" questions you could ask yourself include:

    1. Do you care about Horizontal Scaling ? If yes then you should be looking at TitanDB, OrientDB or DSE Graph because Neo4J (at the time of writing) does not scale horizontally so it is limited by the size of the server.
    2. Does a standardised language query/traversal language matter ? If yes then maybe you should be looking more at Tinkerpop vendors such as TitanDB, OrientDB, DSE Graph, and others. If no then any option will suit you.
    3. Does my data have super nodes ? If yes then you should see how each vendor deals with super nodes. Some vendors shard, others use clever graph partioning algorithms.
    4. How much support do you want ? If you need a lot then maybe you should look at strong enterprise solutions such as DSE, OrientDB or Neo4J. Neo4J is currently considered the most popular graph db and with that comes a large support base.
    5. Do you want to use open source software ? If yes then TitanDB, Neo4j, or OrientDB may be for you

    These are just some of the things you can look into when making a better decision between all the vendors. Note: There are many other vendors you could consider, Blazegraph, HypergraphDB, just to name a few.