Whenever I try to save a specific DataFrame on the DW I get the message:
ERROR: An error occurred while calling o692.save. : com.databricks.spark.sqldw.SqlDWSideException: SQL DW failed to execute the JDBC query produced by the connector. Underlying SQLException(s): - com.microsoft.sqlserver.jdbc.SQLServerException: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopSqlException: String or binary data would be truncated. [ErrorCode = 107090] [SQLState = S0001]
I've checked the size of the strings in my csv file. The bigger one has 38 chars.
This is my save/write method (worked for other DataFrames):
df.write\
.format('com.databricks.spark.sqldw') \
.option('url', conn_string_dw) \
.option('maxStrLength', '4000') \
.option('forwardSparkAzureStorageCredentials', 'true') \
.option('dbTable', db_table_name) \
.option('tempDir', dw_temporary_path_url) \
.option('truncate', 'False')\
.mode('append')\
.save()
What could be happening here?
The problem was on the final file. One specific cell contained multiple lines which caused this truncating problem.