pythonpysparkjdbcmssql-jdbc

Pyspark SQL Server via Linux with IntegratedSecurity using JDBC


I am trying to connect to a SQL server database via a linux server. They are in the same domain and I am able to access it via ODBC driver.

When I try to connect using the Microsoft SQL Driver I am having issues loading the auth.dll I understand that .dll files are mainly for Windows but there are no .so files provided.

I also tried looking at JTDS driver but it does not support integrated security from a linux system.

Is there any alternative way in which I can connect to sql server using windows integreated security via a linux system?

I have a ticket active via

kinit
JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
LD_LIBRARY_PATH="/home/rapha022/ckd/sqljdbc_12.10/enu/auth/x64/"
JAVA VERSION: jre-1.8.0-openjdk
Python Version: 3.6.8
Spark Version: 3.2.4
from pyspark.sql import SparkSession

jdbc_driver = "./sqljdbc_12.10/enu/jars/mssql-jdbc-12.10.0. jre8.jar"
dll_directory = "./sqljdbc_12.10/enu/auth/x64/"
dll_path = "./sqljdbc_12.10/enu/auth/x64/mssql-jdbc_auth-12.10.0.x64.dll"

spark = SparkSession.builder \
.appName("SQLServerKerberos") \
.config("spark.driver.extraClassPath", jdbc_driver) \
.config("spark.executor.extraClassPath", jdbc_driver) \
.config("spark.executor. extraJavaOptions", f"-Djava.library. path={dll_directory}") \
.config("spark.driver.extraJavaOptions", f"-Djava.library.path={dll_directory}") \
.getOrCreate()

port_number = 1433
database_name = "dbname"
principal = 'uname@domain'
server_name = "server.domain"

jdbc_url = f'jdbc:sqlserver://{server_name}: {port_number};' \
+ f'databaseName={database_name};' \
+ 'Encrypt=false;' \
+ 'Trusted_Connection=true;' \
+ 'TrustServerCertificate=true;' \
+ 'ServerSPN=RestrictedKrbHost/RSP-HR-DB3;' \
+ 'AuthScheme=JavaKerberos;' \
+ 'integratedSecurity=true;' \
+ f'principal={principal}

connection_properties = {
"driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",

df = spark.read \
.jdbc(url=jdbc_url, table="person", properties=connection_properties)
df.show()
spark. stop()

I run via

spark-submit -- driver-class-path "./sqljdbc_12.10/enu/jars/mssql-jdbc-12.10.0. jre8.jar" -- jars "./sqljdbc_12.10/enu/jars/mssql-jdbc-12.10.0. jre8.jar" --driver-li
brary-path "./sqljdbc_12.10/enu/auth/x64/" src/extra.py

and I get the stack trace

py4j.protocol.Py4JJavaError: An error occurred while calling o44.jdbc.
: com.microsoft.sqlserver. jdbc.SQLServerException: This driver is not configured for integrated authentication. ClientConnectionId:991210da-e610-4f45-8c8a-b6b665b88b18
at com.microsoft.sqlserver. jdbc.SQLServerConnection.terminate(SQLServerConnection. java:4580)
at com.microsoft.sqlserver. jdbc.AuthenticationJNI .< init>(AuthenticationJNI. java:78)
at com.microsoft.sqlserver. jdbc.SQLServerConnection.logon(SQLServerConnection. java:5686)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.access$300(SQLServerConnection.java:94)
at com.microsoft.sqlserver. jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection. java:5675)
at com.microsoft.sqlserver. jdbc. TDSCommand.execute(IOBuffer. java:7745)
at com.microsoft.sqlserver. jdbc.SQLServerConnection.executeCommand(SQLServerConnection. java:4700)
at com.microsoft.sqlserver. jdbc.SQLServerConnection.connectHelper(SQLServerConnection. java:4133)
at com.microsoft.sqlserver. jdbc.SQLServerConnection.login(SQLServerConnection. java:3690)
at com.microsoft.sqlserver. jdbc.SQLServerConnection.connectInternal(SQLServerConnection. java:3499)
at com.microsoft.sqlserver. jdbc.SQLServerConnection.connect(SQLServerConnection.java:2207)
at com.microsoft.sqlserver. jdbc.SQLServerDriver.connect(SQLServerDriver. java:1320)
at org.apache.spark.sql.execution.datasources. jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49)
at org.apache.spark.sql.execution.datasources. jdbc.connection.ConnectionProvider$.create(ConnectionProvider.scala:77)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:64)
at org.apache.spark.sql.execution.datasources. jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:62)
at org.apache.spark.sql.execution.datasources. jdbc.JDBCRDD$.resolveTable(JDBCRDD. scala:57)
at org.apache.spark.sql.execution.datasources. jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:239)
at org.apache.spark.sql.execution.datasources. jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader. scala:274)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
at org.apache.spark.sql.DataFrameReader. jdbc(DataFrameReader. scala:294)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method. java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker. java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway. invoke(Gateway. java:282)
at py4j.commands.AbstractCommand. invokeMethod(AbstractCommand. java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.UnsatisfiedLinkError: Unable to load authentication DLL mssql-jdbc_auth-12.10.0.x64
at com.microsoft.sqlserver.jdbc.AuthenticationJNI .< clinit>(AuthenticationJNI. java:67)
at com.microsoft.sqlserver. jdbc.SQLServerConnection.logon(SQLServerConnection.java:5685)
... 35 more

Solution

  • The MSSQL jdbc driver supports 3 different ways to do "windows" authentication. (I'm not even counting Azure ActiveDirectory/EntraID which provides even more ways to authenticate).

    https://learn.microsoft.com/en-us/sql/connect/jdbc/setting-the-connection-properties?view=sql-server-ver16

    Kerberos:

    Specify "integratedSecurity=true;authenticationScheme=Javakerberos" in your connection string. Run kinit to generate your TGT token. Don't shorten authenticationScheme to authScheme.

    Windows integrated security:

    Only works on windows and uses a native ".DLL".

    specify "integratedSecurity=true;authenticationScheme=NativeAuthentication".

    NTLM

    specify "integratedSecurity=true;authenticationScheme=NTLM". Supply your windows credentials as the userName and password in the connection string. Mostly works, but not recommended. Use Kerberos instead.

    authenticationScheme=NativeAuthentication is the default, so if you don't specify it properly, you'll get the "driver is not configured..." error message