I am looking at ways to improve deletion of data in my Postgres (partitioned) tables, not worsening the access performance.
Using: Postgres 10.2
Ignoring some unrelated columns, I have the table transactions
with these columns (omitting some unrelated columns):
transaction_id PK
location
type
user_id
transaction_date
Some important points about the current table:
user_id
(modulo 100). This means that the transactions for say user with user_id
3 will go to transactions_3
and with user_id
2356 will go to transactions_56
.What works well: Insertions and retrieval, since we already know the user - we know which table to look at and hence doesn't have to go through the 100 partitions to find it.
What does not: We have a process which deletes the old data quite often - based on users subscription. But this leads to problems (space issues) often as the data deleted doesn't get freed up immediately. As plain VACUUM may not be enough when a table contains large numbers of dead row versions as a result of massive update or delete activity (as is our case here)
How we would like to improve this is to be able to store the the data in partitions based on the transaction date - and then be able to drop the tables when the subscription is over. This will make sure that the space is usable again immediately.
In short, our main goal is to improve the deletion process so that the space is recovered immediately - at the same time making sure the access performance is not worsened
I have a few questions regarding this:
Hoping to get some guidance here on the way forward.
First: upgrading PostgreSQL would be a very good idea, not only because hash partitioning was introduced after v10, but also because of the numerous improvements in performance and features for partitioning since v10.
I have the feeling that the partitioning scheme you are using right now (home-grown hash partitioning) does not help you much. You cannot get rid of a customer with a simple DROP TABLE
(which would be nice), and deleting 10 million rows in a partition is not any more fun than deleting them in a single large table. On the contrary – the relative bloat once autovacuum is done will be more. The only saving grace is that autovacuum will work more efficiently, since it can treat each partition on its own.
To answer your questions:
Yes, partitioning makes most queries slower; hopefully not much slower. That is the price you are paying.
No, your queries will get somewhat slower (proportional to the numbr of partitions, so keep that moderate).
You can partition on both criteria, since a partition can again be a partitioned table. But I question if that is really a good idea, since I doubt that your current partitioning scheme is really beneficial.
Yes, use at least v12, ideally v14.