I am reading the employee
table from SQL Server, but I get this error while using WITH
clause in query and code is working fine if I remove WITH clause in query.
Note: WITH
clause is mandatory to use in my query to optimize the query performance
from pyspark.sql import Sparksession
from pyspark.sql import functions as F
query =f"""(
WITH cte_emp AS
(Select ename,eid, eaddress from
employee)
select * from
cte_emp)"""
source =spark.read.jdbc(url=jdbcurl, table=query,properties=connection
jdbc_df = spark.read.format("jdbc").option("url", "jdbc:sqlserver://localhost:3306/my_db").option("dbtable", "query").option("user", "my_username").option("password", "my_password").load()
This is the error:
com.microsoft.sqlserver.jdbc.SQLserverException: Incorrect syntax near the keyword 'with'
SQL server does not accept WITH clauses in subqueries, so you need to bypass that.
Split the CTE query like this:
prepare_query = f"""
WITH cte_emp AS (
Select
*
from [dbo].[Employees]
) """
query = f"""
select *
from cte_emp
"""
And then use "prepareQuery" option for JDBC:
jdbc_df = spark.read.format("jdbc"). \
option("url", "jdbc:sqlserver://localhost:3306/my_db"). \
option("prepareQuery",prepare_query). \
option("query", query). \
option("user", "my_username"). \
option("password", "my_password").load()
Data is returned correctly.
Source - Spark documentation - https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html