python cassandra azure-cosmosdb autoscaling azure-cosmosdb-cassandra-api

How to connect Python's cosmos_client to Cosmos DB instance using Cassandra API?

I have a Cosmos DB (Cassandra API) instance set up and I'd like to manage it's throughput from a Python application. I'm able to create a azure.cosmos.cosmos_client using the cassandra endpoint and primary password listed in Azure without errors, but all attempted interactions with the client result in "azure.cosmos.errors.HTTPFailure: Status code: 404".

I am already successfully interacting with this database through cassandra-driver in Python, but I'd like access to the cosmos-client to manage throughput provisioning via code. I want to autoscale throughput as database use fluctuates between high levels of utilization and almost no activity.

Creating a cosmos_client requires a valid URI, with schema (https/http/ftp etc...) included. The endpoint listed on azure which was successfully used to connect via cqlsh as well as the Python cassandra-driver did not specify schema. I added "https://" to the beginning of the provided endpoint and was able to create the client in Python ("http://" results in errors, also verified incorrect addresses also result in errors even with "https://"). Now that I have a client object created, any interaction I attempt with it gives me 404 errors.

client = cosmos_client.CosmosClient(f'https://{COSMOS_CASSANDRA_ENDPOINT}', {'masterKey': COSMOS_CASSANDRA_PASSWORD} )

client.ReadEndpoint
        #'https://COSMOS_CASSANDRA_ENDPOINT'

client.GetDatabaseAccount(COSMOS_CASSANDRA_ENDPOINT)
        #azure.cosmos.errors.HTTPFailure: Status code: 404

client.ReadDatabase(EXISTING_KEYSPACE_NAME)
        #azure.cosmos.errors.HTTPFailure: Status code: 404

I'm wondering if using the cosmos_client is the correct way to interact with the Cosmos Cassandra instance to modify throughput from my Python application. If so, how should I set up the cosmos_client properly? Perhaps there is a way to do this directly through database modifications using cassandra-driver.

Solution

I could never get this to work after toiling for a while with trying and failing to access the database via CosmosClient or DocumentClient in Python and .NET. Ultimately I found 2 methods that are each unfortunately a bit hacky and present some challenges that seem unnecessary.

What I ended up doing was accomplishing this via a subprocess calling to the Azure CLI to change throughput. This is the command that is executed:

f'az cosmosdb cassandra table throughput update --account-name {__cosmos_instance_name} --keyspace-name {__cassandra_keyspace} --name {table_name} --resource-group {__cosmos_resource_group} --throughput {new_throughput}'

What is very unfortunate about both methods that I found to work is that this doesn't work when the target database is being throttled due to rate limiting. This meant we also had to implement some logic to throttle our own service's interactions with the database before calling the code to perform scaling.

Some other notes about our solution: The service is hosted in kubernetes, so we had the metric evaluation and scaling execution added to the lifecycle hooks on the pod. The auto-scaler is also also executed when we encounter suspected rate limiting during cassandra interactions when handling cassandra.cluster.NoHostAvailable exceptions.

...

The other way I could set the provisioned throughput from code was via executing cql directly through cassandra-driver by doing the following (in Python):

from cassandra.cqlengine import connection

connection.setup(<CONNECTION_SETUP_ARGS>)
session = connection.get_session()
session.execute("use <CASSANDRA_NAMESPACE>")
session.execute("alter table <CASSANDRA_TABLE_NAME> with cosmosdb_provisioned_throughput=<DESIRED_THROUGHPUT>")

When I get a chance I'll switch to this approach since it doesn't require Azure CLI installation and subprocess calls.

I think I got this idea originally from here.