I am having an external stage on S3
where the parquet files are stored.
Now, I have created a table as T1_D1
:
CREATE TABLE T1_D1(ID NUMBER(38,0),
TRANSACTION_DATE TIMESTAMP_NTZ(9),
PRODUCT VARCHAR(16777216))
Now, I am running the below script, to test whether the data I am loading is correct or not.
select
$1:ID::number,
$1:PRODUCT::VARCHAR(16777216)
from @my_s3/T1/day_2_sales.parquet
(file_format => my_parq)
It successfully returns the ID
numbers, whereas the product is returned as NULL
, in all the rows.
I have even tried to run with the transaction date column but the same problem persists. (i.e) ID
matches and returns the data correctly but timestamp values throw me NULL
.
$1:ID::NUMBER, $1:TRANSACTION_DATE::TIMESTAMP_NTZ(9)
1,
2,
3,
6,
7,
8,
9,
6,
7,
8,
9,
10,
11,
As an additional step i have used AWS crawler, on the source parquet files which gives me a schema of the following table as follows.
1 id : bigint
2 transaction_date : string
3 product : string
Things i have Experimented out are as follows.
select $1:ID::number, $1:PRODUCT::VARCHAR(16777216) from @my_s3/T1/day_2_sales.parquet (file_format => my_parq)
It returned ID's correctly but the product values are null.
So i have used an AWS crawler, in order to find the data types of the source schema, and have written the snowflake query accordingly. by changing the data types in the snowflake. But still it returns NULL.
Then i found out that, the Naming Convention of the columns in the source parquet file are not appropriate. i.e the destination column "ID" is named as "ID" itself in source system, but where as "PRODUCT" is named as "Product" in the source parquet file. and TRANSACTION_DATE is named as Transaction_Date.
The below Query worked successfully.
select $1:ID::number, $1:Transaction_date::timestamp, $1:Product::varchar from @my_s3/T1/day_1_sales.parquet (file_format => my_parq)