I've managed to unload my data into a partitions, but each one of them is also being partitioned into multiple files. Is there a way to force Snowflake to generate a single file per partition?
It also would be great if I can zip all the files.
This is what I got so far:
COPY INTO 'gcs_bucket'
FROM test
PARTITION BY TRUNC(number_of_rows/500000)
STORAGE_INTEGRATION = gcs_int
FILE_FORMAT = (TYPE = CSV, COMPRESSION = gzip, NULL_IF = ('NULL','null'), RECORD_DELIMITER= '\r\n', FIELD_OPTIONALLY_ENCLOSED_BY = "'")
HEADER = TRUE
PS. I'm using csv format (can't change that)
The upper size limit of each file could be changed with MAX_FILE_SIZE
option. Default is 16MB.
COPY INTO 'gcs_bucket'
FROM test
PARTITION BY TRUNC(number_of_rows/500000)
STORAGE_INTEGRATION = gcs_int
...
MAX_FILE_SIZE = 167772160 -- (160MB)
Definition
Number (> 0) that specifies the upper size limit (in bytes) of each file to be generated in parallel per thread. Note that the actual file size and number of files unloaded are determined by the total amount of data and number of nodes available for parallel processing.
Snowflake utilizes parallel execution to optimize performance. The number of threads cannot be modified.**