I am new to synapse analytics and I want to create a notebook, which could be used further as a databrick in the pipeline in order to process the data in one of the tables from the DB.
I want to do everything in PySpark, so I am wondering how to even read the data from existing DB, using the notebook. NOTE: I dont want to use the method, where I am mentioning my password etc.
spark = SparkSession.builder.appName("SynapseAnalyticsDemo").getOrCreate()
# Set up Synapse Analytics credentials
synapse_servername = "<synapse_servername>.sql.azuresynapse.net"
synapse_database = "<synapse_database>"
synapse_username = "<synapse_username>"
synapse_password = "<synapse_password>"
synapse_jdbc_url = f"jdbc:sqlserver://{synapse_servername}:1433;database={synapse_database};user={synapse_username};password={synapse_password}"
# Define the SQL query and table name
table_name = "<sql_pool_table>"
query = f"SELECT * FROM {table_name} WHERE some_column = 'some_value'"
This is the only method I have found, but taking in account that many people will be able to see the databrick (notebook), I dont want to use this method, where I have to mention the password.
I have also tried this way, but it does not work:
%%pyspark
df=spark.sql ("select * from TableName")
df.show()
In the second method I have written Table Name exactly as it is in SQL Server Management Studio, then I tried to use Azure Synapse dedicated SQL pool name and none of them worked.
How can I access the Azure Synapse dedicated SQL pool using the notebook (PySpark)?
You can access dedicated sql pool data in azure synapse notebook without mentioning password by using store the password in azure key vault. For that create Azure key vault and create secret with the password of database.
Retrieve the secret vallue using below code:
from notebookutils import mssparkutils
mssparkutils.credentials.getSecret('AzureKeyvaultName', 'secretName')
Access dedicated sql database using below code:
from notebookutils import mssparkutils
jdbcHostname = "synpool.database.windows.net"
jdbcPort = 1433
jdbcDatabase = "dedsql"
jdbcUsername = "sqladminuser"
jdbcPassword = mssparkutils.credentials.getSecret('sqlservercred', 'sqlpassword')
jdbcDriver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
#url = s"jdbc:sqlserver://${database_host}:${database_port}/${database_name}"
table = "students"
jdbcUrl = f"jdbc:sqlserver://{jdbcHostname}:{jdbcPort};databaseName={jdbcDatabase}"
df1 = spark.read.format("jdbc").option("driver", jdbcDriver).option("url", jdbcUrl).option("dbtable", table).option("user", jdbcUsername).option("password", jdbcPassword).load()
df1.show()