sqlarrayspostgresqlplpgsqlsql-function

Passing array to WHERE IN on multiple columns in a function


I have a table as following:

create table test
(
  id varchar(20),
  text varchar(20)
);
    
insert into test values(1, 'a')
insert into test values(2, 'b')
insert into test values(3, 'c')

To check multi-column in where condition, I can do this:

SELECT * FROM test 
WHERE (id, text) = ANY (VALUES('1', 'a')
                            , ('2', 'b')
                            , ('3', 'c'));

I have a function that takes 2 arrays as input:

create or replace function test_function
(
    p_ids     in varchar[],
    p_texts   in varchar[]
)
returns table (p_id   varchar,
               p_text varchar) as $$
begin
    return query 
    SELECT id, text 
    FROM test 
    WHERE (id, text) = ANY (VALUES('1', 'a')
                                , ('2', 'b')
                                , ('3', 'c'));
end;
$$ LANGUAGE plpgsql;

My question is, what is the most efficient (performance-wise) way to to replace ANY (VALUES('1', 'a'), ('2', 'b'), ('3', 'c')) with the actual array input (p_ids and p_texts)?


Solution

  • "Most efficient" in terms of syntax

    Some of the choices are:

    1. INNER JOIN with variadic unnest() instead of filtering in WHERE
    2. Single-argument unnest() in a row() constructor, inside an array() constructor
    3. Variadic unnest() in that or an IN subquery

    The first one is not only the shortest, but also the quickest, as demonstrated lower in this post.

    demo at db<>fiddle

    create or replace function test_function
    (   p_ids     in varchar[],
        p_texts   in varchar[]
    )
    returns table (p_id   varchar,
                   p_text varchar) as $f$
    begin
        return query 
        SELECT id, text 
        FROM test 
        JOIN unnest(p_ids,p_texts)AS u(id,text) USING(id,text);
        --NATURAL JOIN unnest(p_ids,p_texts)AS u(id,text); --discouraged syntax
        --WHERE (id, text) = ANY(ARRAY(select (unnest(p_ids),unnest(p_texts))));
        --WHERE (id, text) = ANY(ARRAY(select u from unnest(p_ids,p_texts)u));
        --WHERE (id, text) IN (select*from unnest(p_ids,p_texts));
    end;
    $f$ LANGUAGE plpgsql;
    
    select*from test_function('{1,2,3,8,9}','{a,b,c,x}');
    
    p_id p_text
    1 a
    2 b
    3 c

    Watch out for null-padding in case they are of different length

    select row( unnest(array['a','b','c','d'])
               ,unnest(array[1,2])
              ); 
    
    row
    (a,1)
    (b,2)
    (c,)
    (d,)

    "Most efficient" in terms of performance

    The join with variadic unnest() seems to perform best here (execution times in seconds, lower is better):
    demo2 at db<>fiddle

    variant avg a min mi max mx sum sm stddev st mode md
    join_variadic_unnest 0.000297 1 0.000240 2 0.000469 1 0.006236 1 0.000080 1 0.000242 2
    in_variadic_unnest 0.001278 2 0.000162 1 0.022200 2 0.026829 2 0.004795 2 0.000170 1
    array_row_2unnest 0.469599 3 0.459486 4 0.492036 4 0.861575 3 0.009940 3 0.459486 4
    array_variadic_unnest 0.469757 4 0.442742 3 0.490769 3 0.864888 4 0.013638 4 0.442742 3

    Above are timings of each method searching a few times for 11 rows that match two 20-element arrays in a 50k sample.

    Here's the two winners searching 100 times for 262 rows based on two 50-element arrays in a 500k sample:

    variant avg a min mi max mx sum sm stddev st mode md
    join_variadic_unnest 0.000497 1 0.000314 1 0.003766 1 0.050234 1 0.000565 1 0.000316 1
    in_variadic_unnest 0.000558 2 0.000343 2 0.006439 2 0.056383 2 0.000793 2 0.000345 2

    Note that you can use your target table as if it was a type:
    demo3 at db<>fiddle

    create or replace function test_function
    (   tests   in test[] 
    )returns setof test as 
    $f$ SELECT id
             , text 
        FROM test 
        NATURAL JOIN unnest(tests);
    $f$ language SQL;
    

    Then

    select*from test_function(array[ (1,'a')::test
                                    ,(2,'b')::test
                                    ,(3,'c')::test])
    

    Or

    select*from test_function('{"(1,a)","(2,b)","(3,c)"}'::test[])