cassandraazure-cosmosdbazure-cosmosdb-cassandra-api

Cosmos DB Cassandra API Indexes that span partitions


We are in the process of moving our application from on prem to Azure. We are currently using Cassandra and the plan is to use Cosmos DB Cassandra API in Azure. In Cassandra, the general rule of thumb is that an index should correspond to single partition, otherwise it is better to use Materialized Views or secondary tables.

Does the same hold true for Cosmos DB? If I have a query that would return ~20 rows of data that come from 20 different partitions, can I accomplish this by using an index (without incurring significant performance or cost penalties), or should I create a secondary table?

As an aside, I am aware that Cosmos DB Cassandra API has recently introduced Materialized Views, but since this feature is still in Preview, we are not going to use it.


Solution

  • This rule of thumb generally holds for any distributed database (i.e. one that supports transparent sharding/partitioning), including Azure Cosmos DB. With that said, cross partition queries are not necessarily a disaster if they are not frequent, and the latency is tolerable for the user.

    By the way, if you are planning a migration from on prem, it is worth considering Azure Managed Instance for Apache Cassandra. This is a managed hosting service for pure open-source Apache Cassandra, built by the Azure Cosmos DB team. Most notably, it supports hybrid clusters, meaning that you can deploy a Cassandra data center with this service in Azure, but have it join your existing on prem Cassandra ring (as long as you have the required networking in place, and are running open-source Apache Cassandra v3.11 or higher). This will make zero-downtime migration to Azure cloud much more straightforward.