postgresqlparallel-processingpostgispgrouting

Can at least some parts of pgRouting run in parallel with parallel query?


Objectives

I would like to run pgr_dijkstra for each pair of vertices in a routable graph. I have a simple script that takes each pair of vertices and launches a pgr_dijkstra against the database.

It would be nice to use PostgreSQL parallel query capability to speed it up.

I would like to achieve an "easy" speed up without a need to implement advanced code like contractions.

Problem

Although I can see that for some of my tables explain analyze shows a plan that involves Gather and Parallel Seq Scan, trying to run pgr_dijkstra with "parallel" parameters enabled and with the PARALLEL SAFE keywords actually makes it slower.

Input data

I have a PostGIS-enabled database with some OSM data imported with osm2pgsql:

Data Size
Nodes (vertices) table 730 vertices
Nodes (vertices) table 160 kB
Edges table 1100 vertices
Edges table 656 kB

PostgreSQL and system configuration

CPU: 11th Gen Intel Core i7-1165G7

PostgreSQL parameter Value
min_parallel_table_scan_size 8kB
force_parallel_mode off
parallel_setup_cost 10
parallel_tuple_cost 0.001
max_parallel_workers_per_gather 8
max_parallel_workers 8
max_worker_processes 8

I have set force_parallel_mode to off as per this recommendation, but even when on it does not make any difference.

Changing parallel_tuple_cost from the default value to 0.001 does not seem to help.

Queries and results

I have set up a very simple pgr_dijkstra call with and without SAFE, with origin/destination vertices' ids hardcoded:

CREATE OR REPLACE 
FUNCTION pgr_parallel() 
RETURNS VOID 
AS $$
    BEGIN
    PERFORM pgr_dijkstra(
    '
      SELECT gid AS id,
        source,
        target,
        length_m AS cost
      FROM edges_table 
      WHERE
                        NOT source IS NULL
                        AND
                        NOT target IS NULL
                        AND
                        NOT the_geom IS NULL
                        AND NOT length_m IS NULL
    ',
    ARRAY[161]
    ,
    ARRAY[639]
   ,
    directed := false
    );
    END;
$$ LANGUAGE 'plpgsql'
PARALLEL SAFE;

I have a simple script that calls this function N times, and it does not seem that it runs in a parallel way:

Type of function Number (N) of pairs of vertices Execution time
undefined (not parallel) 50,000 67 seconds
PARALLEL SAFE 50,000 89 seconds

As an additional observation, I run htop and see that in the beginning, when the (parallel) script runs, all 8 physical cores are being used. But when the script finishes firing "tasks" to the database, htop shows that only one core gets used by the postgres process.

Questions

If I understand correctly, a query can be 'parallelized' if there is some significant independent work to be done by each worker. In case of pgr_dijkstra and any other shortest path-finding algorithm, isn't this the case? I would suppose that pgr_dijkstra fetches vertices and edges and then runs an independent computation on them.

Is it possible to use the parallel query capability of PostgreSQL to make at least some parts of pgr_dijkstra to run in parallel? If yes, how?

Note: In this post by Crunchy Data and Paul Ramsey, PARALLEL SAFE is used for a function that contains pgr_dijkstra, but there is no detail if it runs faster thanks to that.


Solution

  • Running \df+ pgr_dijkstra;, we see that the function is parallel unsafe, so no, you won't be able to parallelize it within a single query.

    However you could try to run multiple queries in parallel (i.e. between different pairs of vertices), using multiple connections. Bear in mind that they will be competing for the same resources though.