azurecsvazure-synapseopenrowset

Escape characters in .csv for Azure Synapse seems to dissapear?


I have .csv file that looks like this:

"ID", "Name", "Extra Info"
"1", "John", "{\"Event\": \"Click\", \"Button Name\": \"Accept\"}
"2", "Adam", "{\"Event\": \"Click\", \"Button Name\": \"Accept\"}

I'm trying to load this file using this code in Synapse:

SELECT
    TOP 2 *
FROM
    OPENROWSET(
        BULK 'https://[MY STORAGE ACCOUNT].dfs.core.windows.net/[FILE PATH]/[...]/*.csv',
        FORMAT = 'CSV',
        PARSER_VERSION = '2.0'
    ) 
AS [result]

Expecting this result:

ID Name Extra Info
1 John {"Event": "Click", "Button Name": "Accept"}
2 Adam {"Event": "Click", "Button Name": "Accept"}

But I keep getting this error:

Error handling external file: 'Unexpected token 'Event\' at [byte: XXX]. 
Expecting tokens ',', ' ', or '"'. '. 
File/External table name: 'https://[MY STORAGE ACCOUNT].dfs.core.windows.net/[FILE PATH]/[...]/[SPECIFIC FILE NAME].csv'.

It looks like it's ignoring the first quote (") and Escape character in the Extra Info column? Leading to it think that \Event\ is some special token?

I just don't understand why or what I can do to fix this?


Solution

  • I think I found the answer based on this post and some of the Azure documentation:

    It seems that the only valid way to escape Quotes is by using double quotes.

    This means my .csv should be formatted like this:

    "ID", "Name", "Extra Info"
    "1", "John", "{""Event"": ""Click"", ""Button Name"": ""Accept""}
    "2", "Adam", "{""Event"": ""Click"", ""Button Name"": ""Accept""}
    

    Instead of the original (which uses ):

    "ID", "Name", "Extra Info"
    "1", "John", "{\"Event\": \"Click\", \"Button Name\": \"Accept\"}
    "2", "Adam", "{\"Event\": \"Click\", \"Button Name\": \"Accept\"}
    

    Unfortunately I don't see a way around this other than BULK editing all my .csv files...