parsingazure-data-factoryazure-synapse

Azure Synapse CSV Parsing Unexpected Token


I am working on an end to end data engineering solution using Azure Data Factory, Azure Data Lake, and Azure Synapse Analytics. I have a CSV file, "empl.csv" that is a copy of an on-prem SQL server table, stored in ADLS. I am trying to create a view in my Synapse serverless SQL pool of this CSV, but when I try to use OPENROWSET to read the CSV, I get the following error:

Error encountered while parsing data: 'Unexpected token '' at [byte: 60]. Expecting tokens ',', ' ', or '"'. '. Underlying data description: file 'https://*******/empl.csv'.

My query looks like this:

SELECT
    TOP 5 *
FROM
    OPENROWSET(
        BULK 'https://*******/empl.csv',
        FORMAT = 'CSV',
        PARSER_VERSION = '2.0',
        ESCAPECHAR = '\\'
    ) AS data

I have tried '\' as the escape character and '\' as the escape character, as well as leaving the escape character blank altogether. I get the same error each time.

The CSV looks like this:

empl_code,name,u_version
123,"JOE SMITH", "!"
124,"DAVE SMITH", "M"
125,"BRIAN SMITH", "\""

I'm expecting it to parse like this:

empl_code name u_version
123 JOE SMITH !
124 DAVE SMITH M
125 BRIAN SMITH "

The unexpected token in the error appears to be an empty string- I'm not sure how that's possible. How can I get it to properly interpret the escape character and pass the double quote through as a string?

Long time reader, first time question asker. Thanks in advance for your help.


Solution

  • I have tried to read your csv file, and I got the same error. This issue might be with your data or synapse reading in serverless. You can modify the \ in your data to " so that synapse serverless can read it. Use Synapse pipeline copy activity to do that.

    Create a source delimited text dataset with your file and give below configurations.

    enter image description here

    Create a sink dataset with below configurations.

    enter image description here

    Give these datasets to a copy activity in the synapse pipeline and the file will be generated like below.

    empl_code,name,u_version
    "123","JOE SMITH","!"
    "124","DAVE SMITH","M"
    "125","BRIAN SMITH",""""
    

    Now, use the below SQL script by giving the " as Quote and escape characters in the synapse serverless and this will give the expected results.

    SELECT
        TOP 100 *
    FROM
        OPENROWSET(
            BULK 'https://****/targetdelimiter.csv',
            FORMAT = 'CSV',
            FIELDTERMINATOR = ',',
            ESCAPECHAR = '"',
            FIELDQUOTE = '"',
            FIRSTROW = 2
        ) 
        WITH (
            [empl_code] int,
            [name] VARCHAR(100),
            [u_version] varchar(32)
        )
        AS [result];
    

    Result:

    enter image description here