mongodbmongodb-queryaws-documentdbaws-documentdb-mongoapi

DocumentDB query is using the wrong index


I'm using MongoDB version 4.0.0 with DocumentDB.

I have a collection called transactions. That collection has this two indices:

{
  "creation_date": -1
}
{
  "project_id": 1,
  "mid.id": 1  
}

I'm running the following query:

db.transactions.explain('executionStats').find({
  "project_id": "1",
  "mid.id": {
    $in: [/* array with 47 ids */]
  }
}).sort({creation_date: -1}).skip(0).limit(50)

The execution stats for that code are the following:

{
  "queryPlanner": {
    "plannerVersion": 1,
    "namespace": "myDB.transactions",
    "winningPlan": {
      "stage": "SUBSCAN",
      "inputStage": {
        "stage": "LIMIT_SKIP",
        "inputStage": {
          "stage": "IXSCAN",
          "indexName": "creation_date_idx",
          "direction": "forward"
        }
      }
    }
  },
  "executionStats": {
    "executionSuccess": true,
    "executionTimeMillis": "445731.524",
    "planningTimeMillis": "1.487",
    "executionStages": {
      "stage": "SUBSCAN",
      "nReturned": "44",
      "executionTimeMillisEstimate": "445729.856",
      "inputStage": {
        "stage": "LIMIT_SKIP",
        "nReturned": "44",
        "executionTimeMillisEstimate": "445729.787",
        "inputStage": {
          "stage": "IXSCAN",
          "nReturned": "44",
          "executionTimeMillisEstimate": "445729.765",
          "indexName": "creation_date_idx",
          "direction": "forward"
        }
      }
    }
  },
  "serverInfo": {
    "host": "127.0.0.1",
    "port": 27017,
    "version": "4.0.0"
  },
  "ok": 1,
  "operationTime": {
    "$timestamp": "7231419170506670081"
  }
}

It chose the creation_date index and took 445 seconds to perform the query!

If I try, instead, to hint the query to force using the project_id and mid.id index, here is the winning plan:

{
  "queryPlanner": {
    "plannerVersion": 1,
    "namespace": "myDB.transactions",
    "winningPlan": {
      "stage": "SUBSCAN",
      "inputStage": {
        "stage": "LIMIT_SKIP",
        "inputStage": {
          "stage": "SORT",
          "sortPattern": {
            "creation_date": -1
          },
          "inputStage": {
            "stage": "IXSCAN",
            "indexName": "project_id_mid.id_idx",
            "direction": "forward"
          }
        }
      }
    }
  },
  "executionStats": {
    "executionSuccess": true,
    "executionTimeMillis": "10.966",
    "planningTimeMillis": "1.615",
    "executionStages": {
      "stage": "SUBSCAN",
      "nReturned": "44",
      "executionTimeMillisEstimate": "9.310",
      "inputStage": {
        "stage": "LIMIT_SKIP",
        "nReturned": "44",
        "executionTimeMillisEstimate": "9.292",
        "inputStage": {
          "stage": "SORT",
          "nReturned": "44",
          "executionTimeMillisEstimate": "9.287",
          "sortPattern": {
            "creation_date": -1
          },
          "inputStage": {
            "stage": "IXSCAN",
            "nReturned": "44",
            "executionTimeMillisEstimate": "9.208",
            "indexName": "project_id_mid.id_idx",
            "direction": "forward"
          }
        }
      }
    }
  },
  "serverInfo": {
    "host": "127.0.0.1",
    "port": 27017,
    "version": "4.0.0"
  },
  "ok": 1,
  "operationTime": {
    "$timestamp": "7231419990845423617"
  }
}

It takes 10 milliseconds to perform the query. So... why is DocumentDB choosing the creation_date index by default? Is something wrong with my query?

Some other things I tried:


Solution

  • You are forcing the query planner to choose between an index that minimized the number of documents examined, and an index that avoids an in-memory sort. Which performs better when testing the plans will depend on things like which index has the most results near the beginning of the index, and the one that avoids the in-memory sort gets a bonus for reduced memory usage.

    If you create an index that will do both, like:

    {
      "project_id": 1,
      "mid.id": 1,
      "creation_date": -1
    }
    

    It should win, hands down, every time.