I am trying to get the number of files and sum of their sizes in the external stage in snowflake so that I can assign appropriate warehouse while doing copy operation. I could get number of files using
count(distinct METADATA$FILENAME)
select count(distinct METADATA$FILENAME) from @STAGE_NAME
where metadata$file_last_modified > (select max(end_time) from <table_name>);
But I am not able to get the size of the files by querying the stage files. I see that LIST command gives me the size of the files, but I am not sure how to use it in conjunction with the above query. Can someone help me?
Snowflake external stage metadata does not directly provide the size information.
Alternative approach:
You can use LIST to get the size information like you mentioned.
LIST @mystage
returns
Note: last_modified is in Varchar, but you want to do date comparison. so here is a UDF to convert the last_modified to a timestamp which can be used to compare date or timestamp.Sample SO answer for the UDF.
create or replace function LAST_MODIFIED_TO_TIMESTAMP(LAST_MODIFIED string)
returns timestamp_tz
as
$$
to_timestamp_tz(left(LAST_MODIFIED, len(LAST_MODIFIED) - 4) || ' ' || '00:00', 'DY, DD MON YYYY HH:MI:SS TZH:TZM')
$$;
Sample query:
LIST @my_internal_stage;
SELECT $1 AS file_name,count(DISTINCT $1) AS count_files, SUM(DISTINCT $2 ) AS size
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE LAST_MODIFIED_TO_TIMESTAMP("last_modified") > '2025-01-01'
GROUP BY file_name ;
Make sure to run SELECT just after the LIST, since LAST_QUERY_ID uses the last executed query and result_scan returns its resultset.
And you can use distinct on count and sum for the files
Above returns