sqlazure-synapseopenrowset

OPENROWSET only accepting VARCHAR/NVARCHAR for a Date column in a CSV, not DATE (Synapse Serverless)


A CSV file is being dumped into an Azure Datalake. When I try to make an OPENROWSET view in Synapse Serverless, the date column is only working as a VARCHAR/NVARCHAR, not any sort of DATE.

Here's the CSV file with the offending column:

The Date Column of the CSV file, if I download it.

My Synapse Serverless VIEW/SELECT statements works, if I put the column as a VARCHAR/NVARCHAR:

CREATE TestView AS

SELECT  * FROM OPENROWSET(BULK 'Orders Forecast.csv'
 ,DATA_SOURCE = 'workday'
 ,FORMAT = 'CSV'
 ,PARSER_VERSION = '2.0'
 ,HEADER_ROW = TRUE

) 

WITH (
     [Period] varchar(50)
    )
AS t

But then it's not easily usable as a date when querying the VIEW.

If I try to assign it as DateTime2, it errors:

CREATE VIEW TestView AS

SELECT  * FROM OPENROWSET(BULK 'Orders Forecast.csv'
, DATA_SOURCE = 'workday'
, FORMAT = 'CSV'
, PARSER_VERSION = '2.0'
--, ROWSET_OPTIONS = '{"READ_OPTIONS":["ALLOW_INCONSISTENT_READS"]}'
, HEADER_ROW = TRUE

) 

WITH (
     [Period] datetime2
    )

ERROR: Msg 13812, Level 16, State 1, Line 25 Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row starting at byte offset 138, column 8 (Period) in Orders Forecast.csv.

I've also tried going down to PARSER_VERSION 1.0, and doing the then-compatible DATETIME. And DATE. Same results.


Solution

  • You can also try the code below for getting it in datetime format.

    When using *openrowset, you should explicitly specify the collation when declaring the column type inside the WITH clause.

        WITH (
            [C1] VARCHAR(20) COLLATE Latin1_General_100_BIN2_UTF8
        ) 
    

    Next, use the TRY_PARSE function.

    SELECT TRY_PARSE(c1 AS datetime USING 'en-US') AS result
    FROM cte;
    

    Here is the initial data.

    Enter image description here

    Output:

    Enter image description here