azureapache-sparkazure-synapselinked-serverdelta-lake

Linked Server to Synapse Spark Tables: Queries hang if join on a STRING column is present


Setup:

Situation:

More Info:

Not getting a lot out of Live Query Stastics

What I've tried:


Solution

  • Mostly got it (via MS Support)

    The External Tables, pointed at the Delta tables, were using varchar() for the string columns. Switching the columns to nvarchar() seems to fix it.

    Why this is the case, is not clear though.

    It also goes against the documentation [per this link on Best Practices, from MS][1]

    Use the varchar type with some UTF8 collation if you're reading data from Parquet, Azure Cosmos DB, Delta Lake, or CSV with UTF-8 encoding.

    But [shrug].
    [1]: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-serverless-sql-pool