sqlarrayspostgresqlpostgresql-9.6

How to use ARRAY contains operator with ANY


I have a table where one column is array:

CREATE TABLE inherited_tags (
   id serial,
   tags text[]
);

Sample values:

INSERT INTO inherited_tags (tags) VALUES 
    (ARRAY['A','B','C']),  -- id: 1
    (ARRAY['D','E']),      -- id: 2
    (ARRAY['A','B']),      -- id: 3
    (ARRAY['C','D']),      -- id: 4
    (ARRAY['D','F']),      -- id: 5
    (ARRAY['A']);          -- id: 6

I want to find rows which tags column contains some subset of words inside array. For example for input:

ARRAY[ARRAY['A','C'], ARRAY['F'], ARRAY['E']]::text[][]

I want to find all rows that contain ('A' and 'C') OR ('F') OR ('E'). So for example above I should get rows with ids: 1, 2, 5.

I was hoping that I could use syntax like this:

SELECT * FROM inherited_tags WHERE
   tags @> ANY(ARRAY[ARRAY['A','C'], ARRAY['F'], ARRAY['E']]::text[][])

but I get error:

ERROR:  operator does not exist: text[] @> text
LINE 1: SELECT * FROM inherited_tags where tags <@ ANY(ARRAY[ARRAY['...

Postgres 9.6

plpgsql solution is acceptable but SQL is preferred.

DB-FIDDLE: https://www.db-fiddle.com/f/cKCr7Sfab6u8rqaCHhJvPk/0


Solution

  • The problem comes from the fact that the text[] and text[][] data types are internally the same data type. An array has a base type and dimensions, and the ANY operator will always extract the base type to compare, which will always be text and not text[]. It doesn't help that multidimensional arrays require that each subelement has the same length as every other. You can have ARRAY[ARRAY['A','C'],ARRAY['B','N']], but not ARRAY[ARRAY[2,3],ARRAY[1]].

    In short, there is no direct way to make that particular query work. I tried to create a function and an operator for this as well, and that doesn't work, either, for different reasons. See how that went:

    CREATE OR REPLACE FUNCTION check_tag_matches(
        IN leftside text[],
        IN rightside text)
      RETURNS BOOLEAN AS
    $BODY$
     DECLARE rightarr text[];
    BEGIN
      SELECT CAST(rightside as text[]) INTO rightarr;
      RETURN SELECT leftside @> rightarr;
    END;
    $BODY$
    LANGUAGE plpgsql STABLE;
    
    CREATE OPERATOR public.>>(
      PROCEDURE = check_tag_matches,
      LEFTARG = text[],
      RIGHTARG = text,
      COMMUTATOR = >>);
    

    Then when testing it:

    test=# SELECT * FROM inherited_tags WHERE
       tags >> ANY(ARRAY[ARRAY['A','M'], ARRAY['F','E'], ARRAY['E','R']]::text[][]);
    ERROR:  malformed array literal: "A"
    DETAIL:  Array value must start with "{" or dimension information.
    CONTEXT:  SQL statement "SELECT CAST(rightside as text[])"
    PL/pgSQL function check_tag_matches(text[],text) line 4 at SQL statement
    

    It seems that when you try using a multidimensional array like ARRAY[ARRAY['A','M'], ARRAY['F','E'], ARRAY['E','R']]::text[][] in ANY(), it iterates not over ARRAY['A','M'], then ARRAY['F','E'], then ARRAY['E','R'], but over 'A','M','F','E','E','R'. The same thing happens when with unnest.

    test=# SELECT unnest(ARRAY[ARRAY['A','M'], ARRAY['F','E'], ARRAY['E','R']]::text[][]);
     unnest 
    --------
     A
     M
     F
     E
     E
     R
    (6 rows)
    

    Your remaining optiona are to define a function that will read array_length(rightside,1) and array_length(rightside,2) and use nested loops to check it all, or you can send multiple queries to get the inherited tags for each tag, or restructure your data somehow. And you can't even access the ARRAY['A','M'] element using rightside[1] to iterate over it, you're forced to go to the deepest level.