azure-synapsedelta-lake

Data Type differences between Synapse Tables and Delta Lake Tables


I am looking for information around data type differences between Synapse Tables and Delta Lake Tables. Essentially, I want to know if Synapse Tables can be converted to Delta Lake Tables, without encountering any data type related issues. Thanks in advance!


Solution

  • As you want to know if Synapse Tables can be converted to Delta Lake Tables Yes, you can convert Synapse tables to Delta Lake tables.

    I have tried the below approach to get supported datatypes in Delta Lake & Synapse Dedicated pool.

    In dedicated pool you try the below query to get info for datatypes:

    SELECT
        name AS DataType,
        xtype AS TypeCode,
        length AS MaxLength,
        CASE
            WHEN xtype IN (99, 35, 167, 175, 231, 239) THEN 'Character String'
            WHEN xtype IN (48, 52, 56, 104, 106, 108, 122, 127) THEN 'Numeric'
            WHEN xtype IN (61, 58, 40) THEN 'Date and Time'
            ELSE 'Other'
        END AS DataTypeCategory
    FROM
        sys.systypes
    ORDER BY
        name;
    

    Results:

    DataType    TypeCode    MaxLength   DataTypeCategory
    bigint  127 8   Numeric
    binary  173 8000    Other
    bit 104 1   Numeric
    char    175 8000    Character String
    date    40  3   Date and Time
    datetime    61  8   Date and Time
    datetime2   42  8   Other
    datetimeoffset  43  10  Other
    decimal 106 17  Numeric
    float   62  8   Other
    image   34  16  Other
    int 56  4   Numeric
    json    244 -1  Other
    money   60  8   Other
    nchar   239 8000    Character String
    numeric 108 17  Numeric
    nvarchar    231 8000    Character String
    real    59  4   Other
    smalldatetime   58  4   Date and Time
    smallint    52  2   Numeric
    smallmoney  122 4   Numeric
    sql_variant 98  8016    Other
    sysname 231 256 Character String
    time    41  5   Other
    tinyint 48  1   Numeric
    uniqueidentifier    36  16  Other
    varbinary   165 8000    Other
    varchar 167 8000    Character String
    

    To list all the data types that support Delta tables using PySpark:

    from pyspark.sql.types import (
        ByteType, ShortType, IntegerType, LongType,
        FloatType, DoubleType, DecimalType,
        StringType, BinaryType, BooleanType,
        DateType, TimestampType, 
        ArrayType, MapType, StructType, StructField
    )
    delta_supported_data_types = [
        ByteType(), ShortType(), IntegerType(), LongType(),
        FloatType(), DoubleType(), DecimalType(),
        StringType(), BinaryType(), BooleanType(),
        DateType(), TimestampType(),
        ArrayType(StringType()),  
        MapType(StringType(), IntegerType()), 
        StructType([StructField("field1", StringType()), StructField("field2", IntegerType())])  
    ]
    print("Supported data types for Delta tables in PySpark:")
    for dtype in delta_supported_data_types:
        print(dtype.simpleString())
    

    Supported data types for Delta tables in PySpark:

    tinyint
    smallint
    int
    bigint
    float
    double
    decimal(10,0)
    string
    binary
    boolean
    date
    timestamp
    array<string>
    map<string,int>
    struct<field1:string,field2:int>
    

    So Map your Synapse Data Types to Delta Lake Data Types accordingly.