azurepysparkerror-handlingdatabricksazure-databricks

How to output the 'Underlying SQLException' from Azure Databricks instead of the generic exception message?


We're calling Azure Databricks notebooks from an Data Factory pipeline that performs ingestion into Azure Synapse. But whenever a notebook run fails, it just shows the below error message:

com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.

But when we go into the run log and scroll down to this exception message, just below this message, there would be

Underlying SQLException(s):
  - com.microsoft.sqlserver.jdbc.SQLServerException: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: The column [4] is not nullable and also USE_DEFAULT_VALUE is false, thus empty input is not allowed. [ErrorCode = 107090] [SQLState = S0001]

or sometimes it would be like:

Underlying SQLException(s):
  - com.microsoft.sqlserver.jdbc.SQLServerException: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: String or Binary would be truncated

The code we're using to ingest the data is:

try:
    data.write.format('com.databricks.spark.sqldw').option("url", connection_string).option("dbTable", table) \
                .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
                .option("tempDir", Connection.storageaccount_path + 'store/dataload')
                .save(mode="append")
except Exception as e:
            raise Exception("error took place for table: " + table + " : " + str(e))

So, this Underlying SQLException(s): is the actual error message that tells what went wrong. But it never gets shown in the runError output that we see on ADF pipeline. Due to this, it's impossible for us to identify the errors in bulk using Azure Log Analytics. We always have to manually scroll down into the error log one failure after another.

We have thousands of runs happening on Production daily and many pipelines fail regularly. But due to this limitation in seeing the exact error messages, we're unable to monitor the failures efficiently.

Is there a way to make Databricks output the Underlying SQLException(s): instead of the generic message: com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.


Solution

  • To get Actual error message instead of generic error message you need to split it with the appropriate delimiter and get the actual error message according to index no.

    try:
        df1.write.format('com.databricks.spark.sqldw').option("url", "URL").option("dbTable", "demo4") \
                    .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
                    .option("tempDir", "tempdir") \
                    .option("forwardSparkAzureStorageCredentials", "true") \
                    .save(mode="append")
    except Exception as e:
        error_message = str(e)
        error_parts = error_message.split('\n')
        print("Error occurred:",error_parts[2],error_parts[3],error_parts[4])
    

    Here I spilt the error message with the \n (new line) to array and to get the expected result I called out the 2nd, 3rd, 4th index element of split array.

    My execution:

    enter image description here