This is my first time using Polybase and I'm trying to load a sample CSV file (with first record as header) from ADLS Gen2 to Synapse. I've already created a Master Key before so I didn't create it again. The remaining steps I'm implemented are as follows:
-- Step 1
CREATE DATABASE SCOPED CREDENTIAL access_cred
WITH
IDENTITY = 'my_name',
SECRET = '12345678910****==';
-- Step 2
CREATE EXTERNAL DATA SOURCE CreditCards
WITH
(
TYPE = HADOOP,
LOCATION = 'abfss://container01@freesandbox.dfs.core.windows.net',
CREDENTIAL = access_cred
);
-- Step 3
CREATE EXTERNAL FILE FORMAT CC_FileFormat
WITH
(
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS
(
FIELD_TERMINATOR = ',',
FIRST_ROW = 2,
USE_TYPE_DEFAULT = FALSE
)
);
--Step 4
CREATE SCHEMA ext;
CREATE SCHEMA cc;
--Step 5: Create External Table
CREATE EXTERNAL TABLE ext.creditcards (
Card_Type_Full_Name varchar(50),
Issuing_Bank varchar(50),
Card_Number varchar(50),
Card_Holder_Name varchar(50),
CVV_CVV2 varchar(50),
Issue_Date varchar(50),
Expiry_Date varchar(50),
Billing_Date varchar(50),
Card_PIN varchar(50),
Credit_Limit varchar(50)
)
WITH (LOCATION='/CreditCards/', --I've a folder 'CreditCards' inside which the 'Creditcards.csv` file sits
DATA_SOURCE = CreditCards,
FILE_FORMAT = CC_FileFormat,
REJECT_TYPE = VALUE,
REJECT_VALUE = 0
);
--Step 6
CREATE TABLE cc.creditcards
WITH
(
DISTRIBUTION = REPLICATE,
CLUSTERED COLUMNSTORE INDEX
)
AS
SELECT * FROM ext.creditcards
OPTION (LABEL = 'CTAS : Load cc.creditcards');
I don't know what I'm doing wrong. I tried going through multiple posts over internet but nothing really addresses the issue I'm facing. I get the below error when trying to do SELECT * from <external_table>
. (This is also the error I get in Step 6 as I'm CTAS command to load my final table.)
Msg 107090, Level 16, State 1, Line 74
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: MalformedInputException: Input length = 1
I don't understand what else is missing. I've given access permissions on the Container level as well. Can someone please help me in solving this issue?
Solved: Turns out, this error was being caused due to file encoding. The source flatfile I placed on ADLS G2 was in ANSI and I just had to re-upload it by converting it to UTF8. After that it worked perfectly.
Found this tip in a post on Microsoft forums: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/6a431c73-4575-4729-b7b5-9767e2a16c0e/external-table-error?forum=AzureSQLDataWarehouse
For converting ANSI flatfile to UTF8 (using Notepad): https://superuser.com/a/911373