mongodbperformancemongodb-querypymongo

Why $nin is slower than $in, Mongodb


I have collection with 5M documents with correct indexes.$in working perfect, but same query with $nin super slow...What of the reason of this?

Super fast:

{'tech': {'$in': ['Wordpress', 'wordpress', 'WORDPRESS']}}

Super slow..

{'tech': {'$nin': ['Wordpress', 'wordpress', 'WORDPRESS']}}

Solution


  • The following explanation is accurate only for Mongo versions prior to 3.2

    Mongo v3.2 has all kinds of storage engine changes which improved performance on this issue.

    Now $nin hash one important quality, which is it not a selective query, First let's understand what selectivity means:

    Selectivity is the ability of a query to narrow results using the index. Effective indexes are more selective and allow MongoDB to use the index for a larger portion of the work associated with fulfilling the query.

    Now they even state it themselfs:

    For instance, the inequality operators $nin and $ne are not very selective since they often match a large portion of the index. As a result, in many cases, a $nin or $ne query with an index may perform no better than a $nin or $ne query that must scan all documents in a collection.

    Back then selectivity was a big deal performance wise. This all leads us to your question, why isn't the index being used?

    Well when Mongo is asked to create a query plan he preforms a "race" between all available query plans, one of which is a COLSCAN i.e collection scan where the first plan to find 101 documents wins. Due to the poor efficiency of non-selective query's the winning plan (And actually usually faster depending on the index and values in the query) is COLSCAN, Read further about this here