[SOLVED] Need Syntax help in passing CTE in SQL query in Azure databricks

Need Syntax help in passing CTE in SQL query in Azure databricks

I am reading the employee table from SQL Server, but I get this error while using WITH clause in query and code is working fine if I remove WITH clause in query.

Note: WITH clause is mandatory to use in my query to optimize the query performance

from pyspark.sql import Sparksession
from pyspark.sql import functions as F

query =f"""(
WITH cte_emp AS
(Select ename,eid, eaddress from 
employee)
select * from 
cte_emp)"""

source =spark.read.jdbc(url=jdbcurl, table=query,properties=connection


jdbc_df = spark.read.format("jdbc").option("url", "jdbc:sqlserver://localhost:3306/my_db").option("dbtable", "query").option("user", "my_username").option("password", "my_password").load()

This is the error:

com.microsoft.sqlserver.jdbc.SQLserverException: Incorrect syntax near the keyword 'with'

Solution

SQL server does not accept WITH clauses in subqueries, so you need to bypass that.

Split the CTE query like this:

prepare_query = f"""
WITH cte_emp AS (
Select 
* 
from [dbo].[Employees]
) """
query = f"""
select * 
from cte_emp
"""

And then use "prepareQuery" option for JDBC:

jdbc_df = spark.read.format("jdbc"). \
    option("url", "jdbc:sqlserver://localhost:3306/my_db"). \
    option("prepareQuery",prepare_query). \
    option("query", query). \
    option("user", "my_username"). \
    option("password", "my_password").load()

Data is returned correctly.

Source - Spark documentation - https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html