I am working on an end to end data engineering solution using Azure Data Factory, Azure Data Lake, and Azure Synapse Analytics. I have a CSV file, "empl.csv" that is a copy of an on-prem SQL server table, stored in ADLS. I am trying to create a view in my Synapse serverless SQL pool of this CSV, but when I try to use OPENROWSET to read the CSV, I get the following error:
Error encountered while parsing data: 'Unexpected token '' at [byte: 60]. Expecting tokens ',', ' ', or '"'. '. Underlying data description: file 'https://*******/empl.csv'.
My query looks like this:
SELECT
TOP 5 *
FROM
OPENROWSET(
BULK 'https://*******/empl.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
ESCAPECHAR = '\\'
) AS data
I have tried '\' as the escape character and '\' as the escape character, as well as leaving the escape character blank altogether. I get the same error each time.
The CSV looks like this:
empl_code,name,u_version
123,"JOE SMITH", "!"
124,"DAVE SMITH", "M"
125,"BRIAN SMITH", "\""
I'm expecting it to parse like this:
empl_code | name | u_version |
---|---|---|
123 | JOE SMITH | ! |
124 | DAVE SMITH | M |
125 | BRIAN SMITH | " |
The unexpected token in the error appears to be an empty string- I'm not sure how that's possible. How can I get it to properly interpret the escape character and pass the double quote through as a string?
Long time reader, first time question asker. Thanks in advance for your help.
I have tried to read your csv file, and I got the same error. This issue might be with your data or synapse reading in serverless. You can modify the \
in your data to "
so that synapse serverless can read it. Use Synapse pipeline copy activity to do that.
Create a source delimited text dataset with your file and give below configurations.
Create a sink dataset with below configurations.
Give these datasets to a copy activity in the synapse pipeline and the file will be generated like below.
empl_code,name,u_version
"123","JOE SMITH","!"
"124","DAVE SMITH","M"
"125","BRIAN SMITH",""""
Now, use the below SQL script by giving the "
as Quote and escape characters in the synapse serverless and this will give the expected results.
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://****/targetdelimiter.csv',
FORMAT = 'CSV',
FIELDTERMINATOR = ',',
ESCAPECHAR = '"',
FIELDQUOTE = '"',
FIRSTROW = 2
)
WITH (
[empl_code] int,
[name] VARCHAR(100),
[u_version] varchar(32)
)
AS [result];
Result: