sql-serverazureapache-sparkdatabricks

Use SQL query to extract data from an external SQL Server database in Azure databricks


I am connecting to an on prem SQL Server database using the code below and it works as expected. I have wrapped the query I want to use in a view and I called the view to read the data into the dataframe. How do I explicitly specify a select query in the spark.read options instead of using a full table or wrapping my select in a view? e.g. option("sqlcommand", "select col1, col2 from table1;")

df = (spark.read 
        .format("jdbc") 
        .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") 
        .option("url", "jdbc:sqlserver://our connection string")     
        .option("dbtable", "dbo.databrickstest") 
        .option("user",user) 
        .option("password", pwd) 
        .option("encrypt", "true") 
        .option("trustServerCertificate", "true") 
        .load()
)

Solution

  • You should use the query option instead of the dbtable option. Use .option("query", "SELECT col1, col2 FROM table1") to specify your SQL query. Here is the sample code for your reference:

    jdbcHostname = "<serverName>"     
    jdbcPort = 1433    
    jdbcDatabase = "<dbName>"     
    jdbcUsername = "<userName>"             
    jdbcPassword = "<password>"            
    jdbcDriver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"     
     
    
    jdbcUrl = f"jdbc:sqlserver://{jdbcHostname}:{jdbcPort};databaseName={jdbcDatabase}"
    df = spark.read.format("jdbc").option("driver", jdbcDriver).option("url", jdbcUrl).option("query", "select EmployeeID, FirstName from Employees").option("user", jdbcUsername).option("password", jdbcPassword).load()
    df.show()
    

    It will run the query successfully and give below output:

    EmployeeID FirstName
    1 John
    2 Jane
    3 Mike
    4 Emily