azureazure-cosmosdbazure-cosmosdb-sqlapi

Azure Cosmos Db as key value store indexing mode


What indexing mode / policy should I use when using cosmos db as a simple key/value store?

From https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy :

None: Indexing is disabled on the container. This is commonly used when a container is used as a pure key-value store without the need for secondary indexes.

Is this because the property used as partition key is indexed even when indexMode is set to “none”? I would expect to need to turn indexing on but specify just the partition key’s path as the only included path.

If it matters, I’m planning to use the SQL API.

EDIT:

here's the information I was missing to understand this:

  1. The item must have an id property, otherwise cosmos db will assign one. https://learn.microsoft.com/en-us/azure/cosmos-db/account-databases-containers-items#properties-of-an-item
  2. Since I'm using Azure Data Factory to load the items, I can tell ADF to duplicate the column that has the value I want to use as my id into a new column called id: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#add-additional-columns-during-copy
  3. I need to use ReadItemAsync, or better yet, ReadItemStreamAsync since it doesn't deserialize the response, to get the item without using a query. https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.cosmos.container.readitemasync?view=azure-dotnet https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.cosmos.container.readitemstreamasync?view=azure-dotnet

Solution

  • When you set indexingMode to "none", the only way to efficiently retrieve a document is by id (e.g. ReadDocumentAsync() or read_item()). This is akin to a key/value store, since you wouldn't be performing queries against other properties; you'd be specifically looking up a document by some known id, and returning the entire document. Cost-wise, this would be ~1RU for a 1K document, just like point-reads with an indexed collection.

    You could still run queries, but without indexes, you'll see unusually-high RU cost.

    You would still specify the partition key's value with your point-reads, as you'd normally do.