microsoft-dynamicsazure-synapseazure-data-lakeazure-data-lake-gen2

Microsoft Dynamics Export to Datalake - how does your org work around queries failing on files currently being updated?


We're using Microsoft Dynamics Finance & Operations (F&O) Export to Datalake, to have near real-time access to F&O data in a Synapse Datalake.

The issue, is anything being currently updated, throws an error when queried. This is pretty bad for user experiences. Per Microsoft Documentation, this is a known limitation:

Q: When I read data files by using Synapse serverless, I notice an intermittent error: "Unexpected end-of-input within record at...." A: When tables records are frequently updated or deleted in finance and operations apps, you might notice a read/write contention on CSV files, and this error might occur. This issue occurs when the Synapse serverless SQL service queries a CSV file as it's being updated. In many cases, you might be able to fix this issue by retrying the query.

How have you dealt with this?

The only two things I can think of are:

1.) In the Synapse Serverless Database that hits the datalake, one can enable ALLOW_INCONSISTENT_READS. Microsoft Documentation

We've used that from the start - I couldn't say how much it helps, but errors are still fairly frequent when running queries that hit updating files.

2.) Use CETAS, then query CETAS instead. But CETAS can only be dropped & created. So - that's probably best done at night - and at that point, what' is the point of all this again? We're back to data only as fresh as last night, in a fairly complicated ETL transformation.


Solution

  • Had a ticket with Microsoft. As far as anyone can answer - this is just how it is / there's no good workaround.

    They've also suprise-deprecated [Export to Datalake] in favor of the in-development [Synapse Link], which supposdly does not have this issue.