azure-synapserange-partitions

Should I create a partition on a dedicated sql pool table only when there are more than 1 million rows per distribution?


I'm trying to create a new table in a dedicated SQL pool which will have approx. 43800 records each month. My questions are -

  1. Should I create month-wise partitions?
  2. When should I create a partition, my understanding as per the Microsoft documentation is 'A partition is necessary when there are at least 1 Million rows per distribution'. But in my case table will be new and data grows approx. 43800 each month

Hence I believe a partition is not necessary right now, but it can be created in future based on the data volumes. Is my understanding right?


Solution

  • “when there are at least 1 Million rows per distribution” means the table is at least 60 million rows since there are 60 distributions. Please don’t over-partition the table. Each partition is divided into 60 pieces called distributions. So in your case each distribution inside each partition would have 730 rows. When you over-partition then you never get enough rows in a row group to achieve good columnstore compression. So the 1 million rows per distribution advice is saying that partitioning a table which is less than 60 million rows is counterproductive. However you don’t need to partition a table just because you are over 60 million rows.

    I personally wouldn’t bother with partitioning until I was closer to half a billion rows. Let say I have 10 years of data and 500 million rows. Partitioning by month yields me 69,444 rows per distribution per partition which is way too much partitioning. But partitioning that table by year yields 833,333 rows per distribution per partition which is acceptable but on the edge of over-partitioning. But if you can use partition swapping to improve load performance you could consider partitioning for tables that aren’t quite half a billion rows.