I'm experimenting with Citus extension for PostgreSQL to enable parallelism for my analythics queries (single node setup). I have a very simple UPDATE statement for a distributed table which runs perfectly parallel when execute from the IDE (DBeaver) or psql command:
UPDATE my_distributed_table a1
SET
col1 = b.col1,
col2 = b.col2
FROM my_distributed_table a
LEFT JOIN my_local_table b
ON b.id = a.report_id
WHERE
a.id = a1.id and
a.report_id = a1.report_id; -- Sharding key
The table is quite big (> 300mln records) so I can see 32 separate sessions (1 per shard I guess) during the query exection. But, when I put this UPDATE query into a BEGIN/COMMIT transaction, or into a stored procedure, or even a LANGUAGE SQL function (which is not supposed to start a transaction), and execute it with either CALL or SELECT, there is no parallelism! There is only one session which takes forever.
I tried several tweaks like these but no luck:
SET citus.max_adaptive_executor_pool_size = 32; -- DEFAULT 16
SET citus.force_max_query_parallelization to OF; -- DEFAULT off
What I need is to put this UPDATE into either a stored procedureo or a function and make it executing in parallel.
I could not find any clear statemens by googling or looking in Citus docs regarding support of parallelism in transactions, functions or stored procedures, but I believe it must be supported. Maybe there is some setting I'm missing?
Thank you
Adding "ANALYZE my_distributed_table;" before the UPDATE statement solved the problem. Now no matter if is it executed as a standalone, from a transaction, or from a procedure/function, multiple sessions are created as expected.