I'm trying to bulk load 28 parquet files into Snowflake from an S3 bucket using the COPY command and regex pattern matching. But each time I run the command in my worksheet, I'm getting the following bad response:
Copy executed with 0 files processed.
Inside a folder in my S3 bucket, the files I need to load into Snowflake are named as follows:
...
Using the Snowflake worksheet, I'm trying to load data into a pre-existing table, using the following commands:
CREATE or REPLACE file format myparquetformat type = 'parquet';
COPY INTO [Database].[Schema].[Table] FROM (
SELECT $1:field1::VARCHAR(512), $1:field2::INTEGER, $1:field3::VARCHAR(512),
$1:field4::DOUBLE, $1:field5::VARCHAR(512), $1:field6::DOUBLE
FROM @AWS_Snowflake_Stage/foldername/
(FILE_FORMAT => 'myparquetformat', PATTERN =>
'filename00[0-9]+_part_00.parquet')
)
on_error = 'continue';
I'm not sure why these commands fail to run.
In every example I've seen in the Snowflake documentation, "PATTERN" is only used within the COPY command outside of a SELECT query. I'm not sure if it's possible to use PATTERN inside a SELECT query.
In this case, I think it's necessary to use the SELECT query within the COPY command, since I'm loading in parquet data that would first need to be cast from a single column ($1) into multiple columns with appropriate data types for the table (varchar, integer, double). The SELECT query is what enables the importing of the parquet file into the existing table -- is it possible to find a way around this using a separate staging table?
It's a huge pain to load the parquet files one at a time. Is there any way to bulk load these 28 parquet files using the Snowflake worksheet? Or is it better to try to do this using a Python script and the Snowflake API?
For me the below worked, I agree my pattern is quite simple to select all parquet file in the location, but you can probably verify if the regex pattern is valid.
COPY INTO <TABLE_NAME> FROM (
SELECT
$1:col_name_1,
$1:col_name_2
FROM @STAGE_NAME/<PATH_TO_FILES>
)
PATTERN = '.*.parquet'
FORCE = TRUE
FILE_FORMAT = (
TYPE = 'parquet'
);
Side note, Keep in mind that Snowflake has a safety check to skip files if it has already been Staged and loaded once successfully.