I would like to run pgr_dijkstra
for each pair of vertices in a routable graph. I have a simple script that takes each pair of vertices and launches a pgr_dijkstra
against the database.
It would be nice to use PostgreSQL parallel query capability to speed it up.
I would like to achieve an "easy" speed up without a need to implement advanced code like contractions.
Although I can see that for some of my tables explain analyze
shows a plan that involves Gather
and Parallel Seq Scan
, trying to run pgr_dijkstra
with "parallel" parameters enabled and with the PARALLEL SAFE
keywords actually makes it slower.
I have a PostGIS-enabled database with some OSM
data imported with osm2pgsql
:
Data | Size |
---|---|
Nodes (vertices) table | 730 vertices |
Nodes (vertices) table | 160 kB |
Edges table | 1100 vertices |
Edges table | 656 kB |
CPU: 11th Gen Intel Core i7-1165G7
PostgreSQL parameter | Value |
---|---|
min_parallel_table_scan_size |
8kB |
force_parallel_mode |
off |
parallel_setup_cost |
10 |
parallel_tuple_cost |
0.001 |
max_parallel_workers_per_gather |
8 |
max_parallel_workers |
8 |
max_worker_processes |
8 |
I have set force_parallel_mode
to off
as per this recommendation, but even when on
it does not make any difference.
Changing parallel_tuple_cost
from the default value to 0.001
does not seem to help.
I have set up a very simple pgr_dijkstra
call with and without SAFE
, with origin/destination vertices' id
s hardcoded:
CREATE OR REPLACE
FUNCTION pgr_parallel()
RETURNS VOID
AS $$
BEGIN
PERFORM pgr_dijkstra(
'
SELECT gid AS id,
source,
target,
length_m AS cost
FROM edges_table
WHERE
NOT source IS NULL
AND
NOT target IS NULL
AND
NOT the_geom IS NULL
AND NOT length_m IS NULL
',
ARRAY[161]
,
ARRAY[639]
,
directed := false
);
END;
$$ LANGUAGE 'plpgsql'
PARALLEL SAFE;
I have a simple script that calls this function N times, and it does not seem that it runs in a parallel way:
Type of function | Number (N) of pairs of vertices | Execution time |
---|---|---|
undefined (not parallel) | 50,000 | 67 seconds |
PARALLEL SAFE |
50,000 | 89 seconds |
As an additional observation, I run htop
and see that in the beginning, when the (parallel) script runs, all 8 physical cores are being used. But when the script finishes firing "tasks" to the database, htop
shows that only one core gets used by the postgres
process.
If I understand correctly, a query can be 'parallelized' if there is some significant independent work to be done by each worker. In case of pgr_dijkstra
and any other shortest path-finding algorithm, isn't this the case? I would suppose that pgr_dijkstra
fetches vertices and edges and then runs an independent computation on them.
Is it possible to use the parallel query
capability of PostgreSQL
to make at least some parts of pgr_dijkstra
to run in parallel? If yes, how?
Note: In this post by Crunchy Data and Paul Ramsey, PARALLEL SAFE
is used for a function that contains pgr_dijkstra
, but there is no detail if it runs faster thanks to that.
Running \df+ pgr_dijkstra;
, we see that the function is parallel unsafe
, so no, you won't be able to parallelize it within a single query.
However you could try to run multiple queries in parallel (i.e. between different pairs of vertices), using multiple connections. Bear in mind that they will be competing for the same resources though.