We are migrating the project to SQL Servers tables and view to databrick platform based on unity catalog tables/views. While migrating we are using Pyspark SQL engineer to create views and tables.
Below query result are different while compare to SQL Server vs Spark SQL (unity catalog).
Azure databricks based on Spark SQL views
select cast(sum(ActivityValue) as numeric(22,7))
from catalog.schema.viewname
Output value: 18536603617272.0999252
Similar SQL Server query provide the outputs below
select sum(ActivityValue)
from dbo.[viewname]
SQL server output: 18536603617272.0958340
We are encountering the issues after the decimal value, 3rd digit onwards, the value is not matching with the SQL Server output for Spark SQL table values.
How to handle this?
Databricks Spark SQL Column Compatibility Problem
I have tested the creation of an SQL database with tables and views using the NUMERIC(22,7)
data type in my environment. Additionally, I have successfully created equivalent tables and views in Databricks using PySpark SQL with the DECIMAL(22,7)
data type.
Follow the below steps, which I have tried with:
Step 1
Create a table and view in SQL using numeric(22,7).
Step 2
Create a cluster and a notebook in Databricks, and connect to SQL Server by following the documentation below:
Referral Link: Query SQL Server with Databricks | Databricks Documentation](https://docs.databricks.com/aws/en/connect/external-systems/sql-server)
Create a Permanent Table in Databricks (Unity Catalog):
df_sql.write.mode("overwrite").saveAsTable("{catalog_name}.{schema_name}.{table_name}")
Create a View on Top of That Table:
spark.sql("""
CREATE OR REPLACE VIEW main.default.vw_sample_data AS
SELECT *
FROM {catalog_name}.{schema_name}. {table_name}
WHERE status = 'active'
""")
Step 3
Outputs from both SQL Server and Azure Databricks are based on Spark SQL views.
SQL Server
Azure Databricks are based on Spark SQL view