neo4jazure-databricksapache-spark-connector

Unable to run neo4j create constraint cypher query from Databricks using pyspark connector


I am using databricks notebook and neo4j spark connector to run cypher query to create constraints. While executing its given an error.. I tried multiple ways to change the databricks runtime version and spark connector version, but no luck.

Error msg: IllegalArgumentException: Please provide a valid WRITE query.

I tried to test using multiple version of databricks runtime along with spark connector and did not get any luck. Here is the code.

from pyspark.sql import SparkSession

# Initialize Spark session
spark = SparkSession.builder \
 .appName("Neo4j Integration") \
 .getOrCreate()

# Neo4j connection config
neo4j_config = {
 "url": "bolt://localhost:7687",
 "authentication.type": "basic",
 "authentication.basic.username": "<username>",
 "authentication.basic.password": "password"
}

# Sample data
data = [
{"id": "1", "name": "Alice"}
]
 
# Create DataFrame
df = spark.createDataFrame(data)
 
 
Constraints_Query = """
CREATE CONSTRAINT unique_person IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE
"""
# Execute Cypher query to create constraint
df.write.format("org.neo4j.spark.DataSource") \
    .options(**neo4j_config) \
    .option("query", Constraints_Query) \
    .option("database", "neo4j")\
    .mode('Overwrite')\
    .save()

Solution

  • If you want to initialize constraints with the Spark Connector for Neo4j, you have two choices:

    1. configure the right schema optimization strategy, constraints will be inferred from your dataframe schema

    2. use the script option (where the value would be CREATE CONSTRAINT unique_person IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE)

    In any case, your query call must be for data imports, not schema initialization.
    The official documentation hopefully helps in that regards.