I am experiencing inconsistent Retrieved document count
while I am querying Azure CosmosDB records, using different PartitionKey.
I have a table with PartitionKey=/ParitionKey
.
In my dev environment, the records are distributed only between two different partition key values:
1b871081-7f5d-414d-9743-afbf27236d3b
- 108 records5744b90c-208a-4d72-9675-ec5906afef80
- 6 recordsWhen I query the records using the two Partion key values, I do get a different Retrieved document count
, implicitly different RU.
For one partition key (the one with 99% distribution), CosmosDB correctly matches only 1 document, making it efficient from a RU perspective.
For the same query, but using a different partition key, CosmosDB navigates trough all the documents, which also results in a higher (and inneficient) RU usage.
SELECT top 1 * FROM c
where c.PartitionKey = 'PARTITION_KEY_VALUE'
Question Summary
You say
the records are distributed only between two different partition key values:
1b871081-7f5d-414d-9743-afbf27236d3b
- 108 records5744b90c-208a-4d72-9675-ec5906afef80
- 6 records
The query stats show that when you do
SELECT top 1 *
FROM c
where c.PartitionKey = '1b871081-7f5d-414d-9743-afbf27236d3b'
The "Retrieved document count" is 1 but when you do
SELECT top 1 *
FROM c
where c.PartitionKey = '5744b90c-208a-4d72-9675-ec5906afef80'
The "Retrieved document count" is 109
Explanation
This pattern would indicate that it is not using an index seek on PartitionKey
it is just scanning through documents and returning the first one that matches the predicate, it just so happens that it happens to scan all the documents in logical partition 1b871081-7f5d-414d-9743-afbf27236d3b
first (presumably you inserted these in a contiguous batch).
The partition key is not indexed by default when using an exclude rootpath strategy. The only default indexed paths with consistent indexing are id
and _ts
You should add
{
"path": "/partitionKey/?"
}
to your includedPaths
section in that case.
Regarding the fact that it is apparently processing each logical partition in turn I think this clumping is just an artefact of how you loaded the data originally.
e.g. In the below for a collection hosted on a single physical partition I see a default ordering that does not return all documents from one partitionKey and then move onto the next, the ordering is in fact correlated to the original insert order (I used sequential id
numbering when inserting to make this easy to see).