pysparkapache-spark-sqlazure-synapse

How to resolve access issue while creating table from Azure Synapse notebook (PySpark) in specific database schema?


I have an Azure Synapse notebook (written in PySpark) that needs to create a table on our database in a specific schema. The Synapse service principal has read/write access on the DB, but full CONTROL permissions on the schema. So in principle, it should be possible for it to create a table. The code executing this is as follows:

# Retrieve connection details from the linked service using Token Library
server = "{sql_server_name}.database.windows.net"
Port = 1433
Database = "sqldb"
jdbcUrl = f"jdbc:sqlserver://{server}:{Port};databaseName={Database};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30"
token=TokenLibrary.getConnectionString("LS_ASQL_DB")

conn_Prop = {
"driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"accessToken" : token
}

table_name = "raw.floodrisk"  # Specify the schema and table name
selected_df.write.jdbc(url=jdbcUrl, table= table_name , mode="overwrite", properties=conn_Prop)

While running this, I get the following error: CREATE TABLE permission denied in database 'sqldb'.

It seems to me that the code is trying to execute the CREATE TABLE statement on the scope of the DB (for which it doesn't have permission) and only then route it to the schema (for which it does have permission), instead of executing it on the scope of the schema (if that makes sense...). When I give Synapse db_owner access, the statement works, so in principle the connection is okay.

I have tried finding information on how to change the operating scope to the schema. I looked at possible properties in the conn_Prop object or maybe the arguments that are passed to df.write.jdbc(). Also tried finding a possible connection string that would scope the connection to just the schema, but I couldn't find anything working in either of those options.


Solution

  • so the goal is to not grant it owner rights to the full database.

    If you don't want to give full access to the database to the service principal user. Instead of adding db_owner role to the user, change the authorization of the required schema to the user from default authorization. Use below command for it:

    Alter  AUTHORIZATION  ON  SCHEMA::<schemaName> TO <synapseSPNUser>;
    

    After that grant the required permissions at schema level using below command:

    Grant  SELECT  ON  SCHEMA::<schemaName>  TO <schemaName>;
    Grant  CREATE  TABLE, CREATE  VIEW, CREATE  FUNCTION, CREATE  PROCEDURE  TO <synapseSPNUser>;
    

    Then you will be able to create table in Azure SQL database using below code:

    from pyspark.sql import SparkSession
    from pyspark.sql.types import StructType, StructField, StringType, IntegerType
    schema = StructType([
        StructField("ID", IntegerType(), True),
        StructField("Name", StringType(), True),
        StructField("RiskLevel", StringType(), True)
    ])
    data = [
        (1, "Location A", "High"),
        (2, "Location B", "Medium"),
        (3, "Location C", "Low")
    ]
    selected_df = spark.createDataFrame(data, schema)
    server = "<serverName>.database.windows.net"
    Port = 1433
    Database = "<dbName>"
    jdbcUrl = f"jdbc:sqlserver://{server}:{Port};databaseName={Database};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30"
    token=TokenLibrary.getConnectionString("AzureSqlDatabase1")
    
    conn_Prop = {
    "driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver",
    "accessToken" : token
    }
    
    table_name = "raw.floodrisk"  # Specify the schema and table name
    selected_df.write.jdbc(url=jdbcUrl, table= table_name , mode="overwrite", properties=conn_Prop)
    

    enter image description here

    Created Table:

    enter image description here

    For more information you can refer to this.