sqldatabricksazure-databricksdatabricks-sqlaws-databricks

Need Syntax help in passing CTE in SQL query in Azure databricks


I am reading the employee table from SQL Server, but I get this error while using WITH clause in query and code is working fine if I remove WITH clause in query.

Note: WITH clause is mandatory to use in my query to optimize the query performance

from pyspark.sql import Sparksession
from pyspark.sql import functions as F

query =f"""(
WITH cte_emp AS
(Select ename,eid, eaddress from 
employee)
select * from 
cte_emp)"""

source =spark.read.jdbc(url=jdbcurl, table=query,properties=connection


jdbc_df = spark.read.format("jdbc").option("url", "jdbc:sqlserver://localhost:3306/my_db").option("dbtable", "query").option("user", "my_username").option("password", "my_password").load()

This is the error:

com.microsoft.sqlserver.jdbc.SQLserverException: Incorrect syntax near the keyword 'with'


Solution

  • SQL server does not accept WITH clauses in subqueries, so you need to bypass that.

    Split the CTE query like this:

    prepare_query = f"""
    WITH cte_emp AS (
    Select 
    * 
    from [dbo].[Employees]
    ) """
    query = f"""
    select * 
    from cte_emp
    """
    

    And then use "prepareQuery" option for JDBC:

    jdbc_df = spark.read.format("jdbc"). \
        option("url", "jdbc:sqlserver://localhost:3306/my_db"). \
        option("prepareQuery",prepare_query). \
        option("query", query). \
        option("user", "my_username"). \
        option("password", "my_password").load()
    

    Data is returned correctly.

    Source - Spark documentation - https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html