insertsnowflake-cloud-data-platformsimultaneous

Snowflake - Simultaneous inserts into the same table - Will they be in multiple independent partitions OR 1 common partition?


In Snowflake, I am planning to run multiple INSERT statements into the same table simultaneously. An example could be batch INSERT 10 million records for 1 date, and batch INSERT 10 million records for another date.

  1. Would such simultaneous inserts into the same table result in multiple independent partitions (2 independent sets of partitions for each date) OR 1 common partition for both dates?
    • Reason I ask is because I want subsequent SELECT/read operations on the table to read data for 1 date from their independent partition(s), instead of 1 big common partition containing data for just 1 date. This would greatly improve query performance in future

Solution

  • As "Each micro-partition contains between 50 MB and 500 MB of uncompressed data" if you are loading millions of records (in a single process) then that's going to create more than one micro-partition. Running a similar process in parallel is not going to write to the same micro-partitions (though I suppose if one process finishes with a very small micro-partition then other processes might re-write that one partition - but that's an edge case that won't have any measurable effect).

    There won't be one micro-partition for one date, so I don't believe your concerns about how subsequent SELECTs will query the micro-partitions are relevant.

    If your table is big enough, and you are querying it via a "key" other than one that corresponds to the insert sequence, then you may benefit from clustering the table.

    If the natural insert order does not match the way you most commonly query the data then inserting the data using an ORDER BY column that matches your query key may be cheaper overall