azure-cosmosdbazure-synapse

How to specify CosmosDb Synapse Link types when parquet type is incorrect?


I have a CosmosDb and a Synapse workspace linked. Everything almost works using Synapse to create SQL views to the Cosmos data.

In Cosmos I have one data set with a property that is always a zero. I know it is actually a decimal because it is a price and future data is likely to contain decimal prices.

In Synapse I need to project this data into an SQL view where that column is correctly a decimal(19,4).

When I run an OpenRowSet query into the Cosmos data and attempt to specify the type for this property I get the following error.

select *
from OPENROWSET(
    'CosmosDb',
    'account=myaccount;database=myDatabase;region=theRegion;key=xxxxxxxxxxxxxxx',
    [myCollection])
    with (
    [salesPrice] float '$.salesPrice')
as testQuery

I get the error:

Column 'salesPrice' of type 'FLOAT' is not compatible with external data type 'Parquet physical type: INT64', please try with 'BIGINT'.

Obviously a BIGINT here is going to fail as soon as I get a true decimal price.

I think the parquet type is getting set to BIGINT because in Cosmos all the values for this column are zero. I guess more generally it would be the same problem if the Cosmos property was all non-zero integers.

How can I force the type of salesPrice to be a decimal or float?

(I don't want to get side tracked here on float vs decimal for monetary values, I understand the difference; this error happens either way)

UPDATE

This problem manifests itself also in another way without specifying a schema with OPENROWSET.

In a new CosmosDb collection insert a document such as:

{
   "myid" : 1,
   "price" : 0
}

If I wait a minute or so I can query this document from Synapse with:

select *
from OPENROWSET(
    'myCosmosDb',
    'account=myAccount;database=myDatabase;region=myRegion;key=xxxxxxxxxxxxxxxxxxx',
    [myCollection])
    as testQuery;

and I get the expected results.

Now add a second document:

{
   "myid" : 1,
   "price" : 1.1
}

and re-run the query and I get the same error:

Column 'price' of type 'FLOAT' is not compatible with external data type 'Parquet physical type: INT64', please try with 'BIGINT'

Is there any way to work around or prevent these kinds of errors?


Solution

  • How about set the document like

    {
       "myid" : "1",
       "price" : "1.1"
    }