azureazure-databricksazure-cosmosdb-gremlinapi

Compatible libraries for Azure Cosmos GraphDB and Azure Databricks


I am trying to offload data from Azure-Databricks onto Azure Cosmos-GraphDB as needed vertices and edges.

I am continuously encountering java.lang.ClassNotFoundException error. I have mostly tried all my cards with all combinations of Library versions and respective Databricks Runtime Versions, but no luck. I have tried most of the compatible library versions mentioned under - https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-spark_3-2_2-12/README.md#download

I will be using DBR- 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12), so any guidance on the right MAVEN libraries for Azure Cosmos Graph DB, please?

java.lang.ClassNotFoundException: Failed to find data source: com.microsoft.azure.cosmosdb.spark.
Please find packages at http://spark.apache.org/third-party-projects.html

Solution

  • Below library with Graphframes did the trick. I am able to ingest data into azure cosmos-DB, even quicker than gremlin-python.

    com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12:4.11.1
    

    I had to engage cosmos.oltp SQL API along with the above library.

    cosmos_edges.write.format("cosmos.oltp").options(**cfg).mode("APPEND").save()