[SOLVED] PostgreSQL query is much slower within function than by itself

PostgreSQL query is much slower within function than by itself

I have a PostgreSQL query that selects up to 10,000 rows from some tables with about 16,000,000 records. This takes about 3 seconds. The problem is that if I put the exact same query into a function, it is much slower and takes about 1:20 minutes.

A quick overview of my set-up

(Note that in reality the set-up is a bit more complex, but the whole problem can be expressed in this simplified way.)

I have a relational PostgreSQL database that contains the cells and nodes of numerical grids. A grid consists of many cells, which in turn consist of one or several nodes, and a node is basically a point in space. So there are three tables that are relevant for my problem: GridCells, GridNodes and GridCellNodeLinks. The tables look like this:

CREATE TABLE public."GridNodes" (
    "id" serial NOT NULL,
    PRIMARY KEY ("id")
);
CREATE TABLE public."GridCells" (
    "id" serial NOT NULL,
    "grid" integer NOT NULL,
    PRIMARY KEY ("id")
);
CREATE TABLE public."GridCellNodeLinks" (
    "gridCellId" int NOT NULL,
    "gridNodeId" int NOT NULL,
    FOREIGN KEY ("gridCellId") REFERENCES public."GridCells"("id"),
    FOREIGN KEY ("gridNodeId") REFERENCES public."GridNodes"("id")
);

Problem: querying over the link table

I want to select all the nodes that belong to a given grid using pagination with 10,000 nodes per page. For higher performance, I use WHERE id >= firstIdOnPage instead of OFFSET pageNumber * 10000.

For grid 1 (which has 1,038,240 nodes), the query looks like this:

SELECT
    N."id"
FROM public."GridCells" G
LEFT JOIN public."GridCellNodeLinks" L
    ON G."id" = L."gridCellId"
LEFT JOIN public."GridNodes" N
    ON N."id" = L."gridNodeId"
WHERE "grid" = 1
and N."id" >= 1030001
ORDER BY "id" ASC
LIMIT 10000;

As mentioned, this takes a few seconds to return, regardless of the settings. However, I put the query into a function:

(EDIT: In reality, I want to pass the parameters in the WHERE and LIMIT parts to this function instead of hardcoding them.)

CREATE OR REPLACE FUNCTION GetNodes()
RETURNS TABLE(
    "id" INT
)
AS
    $BODY$
    BEGIN
        RETURN QUERY 
            SELECT
                N."id"
            FROM public."GridCells" G
            LEFT JOIN public."GridCellNodeLinks" L
                ON G."id" = L."gridCellId"
            LEFT JOIN public."GridNodes" N
                ON N."id" = L."gridNodeId"
            WHERE "grid" = 1
            and N."id" >= 1030001
            ORDER BY "id" ASC
            LIMIT 10000;
    END;
    $BODY$
LANGUAGE plpgsql
PARALLEL SAFE;

Calling that function SELECT * FROM GetNodes(); takes nearly 30 times as long but only when the number of nodes it finds is smaller than the LIMIT. If I use a smaller first ID for the page, e.g. N."id" >= 1020001, or if I shorten the page, e.g. LIMIT 8240, it's still finished within a few seconds. Again, this only happens in the function not in the query.

Any idea why the function behaves so drastically different from the 'pure' query?

What I tried

The query only does index scans, so that should be fine. I tried to do EXPLAIN ANALYZE on the function, but as explained here that only gives a non-conclusive "function scan". Using auto_explain is no option because I'm no superuser.
There are ways to make the query more efficient (e.g. using sub-queries) and make it finish in as little as 500 ms. But that has no effect in the function. So I'd say that's beyond the scope of this question.
I added the PARALLEL SAFE, but that had no effect at all.
(EDIT) Changing to language sql has no effect either.

Solution

It turned out that the issue was related to the fact that DBeaver only fetches a limited number of rows from a query result (by default, 200 rows). When the query is within a function, the entire function has to run first and then 200 rows are fetched. Otherwise, DBeaver secretly limits the query to 200, and the execution planner can take that into account, making the query faster.

The setting can be found under Properties > Editors > Data editor and is called ResultSet fetch size.

Changing that to 10000 makes my query just as slow as the function. Thus, the function actually was a good representation of how the query really performed. I'll have to go back to see how I can improve the performance, but that's a different issue.