mongodbmongodb-indexes

Multikey indexing, performance drawbacks


Consider a collection of approximately 1 mln. documents having the following schema:

{
    _id: ObjectId(),
    Title: "",
    Description: "",
    Scheduling: {
        From: 20230202,
        To: 20230201,
        MagicValue: 12
    },
    Mandatory: true,
    Type: "None"
}

Could there be any possible drawback in creating a compound index which contains a multikey index value like so?

{ _id: 1, Title: 1, Scheduling.From: 1 }

The documentation cites limitations only when dealing with nested array documents and goes on explaining index bounds creation rules and so on... If you really think about it the dummy document above could really be expressed as:

{
    _id: ObjectId(),
    Title: "",
    Description: "",
    Scheduling.From: 20230202,
    Scheduling.To: 20230201,
    Scheduling.MagicValue: 12,
    Mandatory: true,
    Type: "None"
}

And it wouldn't make that much of a difference in terms of how the indexing engine treats it, as the "Scheduling" inner document should act like a namespace. Therefore I'm asking here... would the latter document behave differently than the first one above in respect to the same compound index? Thanks :)


Solution

  • Expanding on my comment, the situation that you described would not result in the index being multikey (in MongoDB terminology). Multikey indexes arise from the following conditions as noted in the documentation:

    MongoDB automatically creates a multikey index if any indexed field is an array; you do not need to explicitly specify the multikey type.

    No fields in your document, and therefore none of the indexed fields, are arrays. We can prove that the index is not multikey via the following test:

    test> db.foo.drop()
    false
    test> db.foo.insert({ _id: ObjectId(), Title: "", Description: "", Scheduling: { From: 20230202, To: 20230201, MagicValue: 12 }, Mandatory: true, Type: "None" })
    {
      acknowledged: true,
      insertedIds: { '0': ObjectId("649de6208c8616ab438dd397") }
    }
    test> db.foo.createIndex({ _id: 1, Title: 1, "Scheduling.From": 1 })
    _id_1_Title_1_Scheduling.From_1
    test> db.foo.find().hint("_id_1_Title_1_Scheduling.From_1").explain().queryPlanner.winningPlan.inputStage
    {
      stage: 'IXSCAN',
      keyPattern: { _id: 1, Title: 1, 'Scheduling.From': 1 },
      indexName: '_id_1_Title_1_Scheduling.From_1',
      isMultiKey: false,
      multiKeyPaths: { _id: [], Title: [], 'Scheduling.From': [] },
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: {
        _id: [ '[MinKey, MaxKey]' ],
        Title: [ '[MinKey, MaxKey]' ],
        'Scheduling.From': [ '[MinKey, MaxKey]' ]
      }
    }
    

    Most specifically, the output reports isMultiKey: false.

    By contrast, here is what is reported for a compound index that is multikey:

    > db.foo.drop()
    false
    test> db.foo.insert({_id:1, x: [1,2,3] })
    { acknowledged: true, insertedIds: { '0': 1 } }
    test> db.foo.createIndex({_id:1, x:1})
    _id_1_x_1
    test> db.foo.find().hint("_id_1_x_1").explain().queryPlanner.winningPlan.inputStage
    {
      stage: 'IXSCAN',
      keyPattern: { _id: 1, x: 1 },
      indexName: '_id_1_x_1',
      isMultiKey: true,
      multiKeyPaths: { _id: [], x: [ 'x' ] },
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: { _id: [ '[MinKey, MaxKey]' ], x: [ '[MinKey, MaxKey]' ] }
    }
    

    So none of the considerations for multikey index bounds apply in your situation.

    That said, there is one other thing to note. Your index is prefixed with the _id field which is unique by definition in MongoDB. If the query (or queries) that you intend to use this index for have an equality condition on the _id field then you will have to force the database to use it. Current versions of MongoDB will default to using the default { _id: 1 } index anytime they see such a condition.

    We can see that by restoring your desired index and running .explain() a relevant query:

    test> db.foo.drop()
    false
    test> db.foo.createIndex({ _id: 1, Title: 1, "Scheduling.From": 1 })
    _id_1_Title_1_Scheduling.From_1
    test> db.foo.find({_id: 123, Title: 456, "Scheduling.From": 789}).explain().queryPlanner.winningPlan
    {
      stage: 'FETCH',
      filter: {
        '$and': [
          { 'Scheduling.From': { '$eq': 789 } },
          { Title: { '$eq': 456 } }
        ]
      },
      inputStage: {
        stage: 'IXSCAN',
        keyPattern: { _id: 1 },
        indexName: '_id_',
        isMultiKey: false,
        multiKeyPaths: { _id: [] },
        isUnique: true,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: { _id: [ '[123, 123]' ] }
      }
    }
    
    

    Forcing index usage via .hint() demonstrates that the index can be used (and used efficiently) if needed:

     test> db.foo.find({_id: 123, Title: 456, "Scheduling.From": 789}).hint("_id_1_Title_1_Scheduling.From_1").explain().queryPlanner.winningPlan
    {
      stage: 'FETCH',
      inputStage: {
        stage: 'IXSCAN',
        keyPattern: { _id: 1, Title: 1, 'Scheduling.From': 1 },
        indexName: '_id_1_Title_1_Scheduling.From_1',
        isMultiKey: false,
        multiKeyPaths: { _id: [], Title: [], 'Scheduling.From': [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: {
          _id: [ '[123, 123]' ],
          Title: [ '[456, 456]' ],
          'Scheduling.From': [ '[789, 789]' ]
        }
      }
    }