mongodbindexingcompound-indexnosql

Custom MongoDB Object _id vs Compound index


So I need to create a lookup collection in MongoDB to verify uniqueness. The requirement is to check if the same 2 values are being repeated or not. In SQL, I would something like this

SELECT count(id) WHERE key1 = 'value1' AND key2 = 'value2'

If the above query returns a count then it means the combination is not unique. I have 2 solutions in mind but I am not sure which one is more scalable. There are 30M+ docs against which I need to create this mapping.

Solution1:

I create a collection of docs with compound index on key1 and key2

{
  _id: <MongoID>,
  key1: <value1>,
  key2: <value2>
}

Solution2:

I write application logic to create custom _id by concatenating value1 and value2

{
  _id: <value1>_<value2>
}

Personally, I feel the second one is more optimised as it only has a single index and the size of doc is also smaller. But I am not sure if it is a good practice to create my own _id indexes as they may not be completely random. What do you think?

Thanks in advance.

Update:

My database already has a lot of indexes which take up memory so I want to keep index size to as low as possible specially for collections which are only used to verify uniqueness.


Solution

  • I would suggest Solution 1 i.e to use compound index and use two different properties key1 and key2

    db.yourCollection.ensureIndex( { "key1": 1, "key2": 1 }, { unique: true } )
    
    1. You can search easily by individual field if required. i.e if you require to search only by key1 or key2 then it would be easy with compound index. If you make _id with combination of keys, then it will be hard to search by individual field.
    2. Size of document in Mongo is very least bothered while designing document.
    3. If in near future if you would required to change keys values of same document with respect to other values, it will be easy. Keep in mind if you are using reference of this document in other collection's document.
    4. In terms of your scalability, _id index would be sequential, easily shardable, and you can let MongoDB manage it.
    5. If you are searching with those keys then it will use that index otherwise it will use the other required indexes for your search.

    If you are still thinking of size of document than searching then you can go with Solution 1, make _id like

    {_id:{key1:<value1>,key2:<value2>}}
    

    By this you can search specific _id.key1 too.

    Update:

    Yes if document size is your concern than maintaining. And if you are sure about keys will not modify in future of same document and if it still modifying and do not have reference in other collections, then you can use Solution 1. Just use keys as objects than underscore _. You can add more keys later too if wanted in future.