arangodbarangojs

Which method of storing this particular data would be more efficient?


I have an existing database for a game - currently using MySQL to store information - and I wish to test out ArangoDB to compare speed.

I am wondering whether it would be better to store all of a player's information in a single collection or if it would be more efficient (or even just better practice) to separate the two.

In MySQL I didn't really have a choice, but using ArangoDB I do.

For example, storing inventory information in MySQL:

    +---------------------------+
    | user_id | item_id | count |
    +---------+---------+-------+
    |       1 |       1 |     7 |
    |       1 |       2 |     4 |
    +---------+---------+-------+

Or in ArangoDB, I could do either:

  1. A single collection for all of the information:

    {
        _key: "Unique User ID",
        health: 100,
        money: 52.38,
        // .... ,
        inventory:
            {
                item1: 7,
                item2: 4
                // , ....
            }
    }
    
  2. Separate the above collection into two different ones (one for health, money, etc. and one for inventory data):

    // 'user' collection
    {
        _key: "Unique User ID",
        health: 100,
        money: 52.38,
        ....
    }
    
    // 'inventory' collection
    {
        _key: "Unique User ID",
        item1: 7,
        item2: 4
        // , ....
    }
    

Which one of the two methods above (or even another I didn't think of) would be more efficient?


Solution

  • Well, the answer depends on how you expect the data being used by your application. Here are some pointers though on how I would choose how to store the data:

    Since ArangoDB has native support for graph traversals, the above setup is optimized for really fast lookups. The rule of thumb when starting the move from relational to graph databases is that main tables become collections and intermediate (join) tables become edge collections. (obviously there is more to it , but that is a good start)

    If I were starting from scratch, I would probably go with option 3 first. However, as I mentioned at the beginning, it depends on how you intend to use the data.