We are starting a project with PostgreSQL and will need to use Citus in the near future for a multi tenant application so we are aiming at preparing our PostgreSQL database schema appropriately for an easy upgrading to Citus. I was reading the following page https://learn.microsoft.com/en-us/azure/postgresql/hyperscale/concepts-choose-distribution-column#best-practices and it states the following:
"Partition distributed tables by a common tenant_id column. For instance, in a SaaS application where tenants are companies, the tenant_id is likely to be the company_id."
The question is whether the term "Partition" in the statement from above is referring to PostgreSQL table partitioning (https://www.postgresql.org/docs/14/ddl-partitioning.html) or is it referring to Citus sharding by key? Does PostgreSQL table partitioning by tenant_id
make any sense or provide any benefit when sharding table in Citus with the same sharding key as the one used to partition table in PostgreSQL (tenant_id
)?
Disclaimer: Ex-Citus team member here (but no longer affiliated with Citus or Microsoft)
I'm fairly certain that document is referencing partitions as in shards in a Citus cluster. Most Citus setups I have seen primarily use Citus sharding, and not Postgres table partitioning.
You can use Postgres table partitioning in combination with Citus, for example if you have time-based partitions that you would want to drop after the retention time has expired. Each time-based partition could be a separate distributed table in the Citus system, so that you get benefits of both worlds (Citus to distribute across nodes, Postgres Partitioning for effective deletion and avoiding autovacuum issues). Note that you would not partition on the same column - that doesn't really make sense in my experience.