I am looking for information around data type differences between Synapse Tables and Delta Lake Tables. Essentially, I want to know if Synapse Tables can be converted to Delta Lake Tables, without encountering any data type related issues. Thanks in advance!
As you want to know if Synapse Tables can be converted to Delta Lake Tables Yes, you can convert Synapse tables to Delta Lake tables.
I have tried the below approach to get supported datatypes in Delta Lake & Synapse Dedicated pool.
In dedicated pool you try the below query to get info for datatypes:
SELECT
name AS DataType,
xtype AS TypeCode,
length AS MaxLength,
CASE
WHEN xtype IN (99, 35, 167, 175, 231, 239) THEN 'Character String'
WHEN xtype IN (48, 52, 56, 104, 106, 108, 122, 127) THEN 'Numeric'
WHEN xtype IN (61, 58, 40) THEN 'Date and Time'
ELSE 'Other'
END AS DataTypeCategory
FROM
sys.systypes
ORDER BY
name;
Results:
DataType TypeCode MaxLength DataTypeCategory
bigint 127 8 Numeric
binary 173 8000 Other
bit 104 1 Numeric
char 175 8000 Character String
date 40 3 Date and Time
datetime 61 8 Date and Time
datetime2 42 8 Other
datetimeoffset 43 10 Other
decimal 106 17 Numeric
float 62 8 Other
image 34 16 Other
int 56 4 Numeric
json 244 -1 Other
money 60 8 Other
nchar 239 8000 Character String
numeric 108 17 Numeric
nvarchar 231 8000 Character String
real 59 4 Other
smalldatetime 58 4 Date and Time
smallint 52 2 Numeric
smallmoney 122 4 Numeric
sql_variant 98 8016 Other
sysname 231 256 Character String
time 41 5 Other
tinyint 48 1 Numeric
uniqueidentifier 36 16 Other
varbinary 165 8000 Other
varchar 167 8000 Character String
To list all the data types that support Delta tables using PySpark:
from pyspark.sql.types import (
ByteType, ShortType, IntegerType, LongType,
FloatType, DoubleType, DecimalType,
StringType, BinaryType, BooleanType,
DateType, TimestampType,
ArrayType, MapType, StructType, StructField
)
delta_supported_data_types = [
ByteType(), ShortType(), IntegerType(), LongType(),
FloatType(), DoubleType(), DecimalType(),
StringType(), BinaryType(), BooleanType(),
DateType(), TimestampType(),
ArrayType(StringType()),
MapType(StringType(), IntegerType()),
StructType([StructField("field1", StringType()), StructField("field2", IntegerType())])
]
print("Supported data types for Delta tables in PySpark:")
for dtype in delta_supported_data_types:
print(dtype.simpleString())
Supported data types for Delta tables in PySpark:
tinyint
smallint
int
bigint
float
double
decimal(10,0)
string
binary
boolean
date
timestamp
array<string>
map<string,int>
struct<field1:string,field2:int>
So Map your Synapse Data Types to Delta Lake Data Types accordingly.