regexpostgresqlpostgresql-9.0set-returning-functions

CASE with regexp: "set-valued function called in context that cannot accept a set"


I'm trying to do a slightly complicated string conversion in PostgreSQL 9.0.1. The values in my_col are long strings in the format of:

'12345_sometext_X12B_1'
'12345_sometext_optionaltext_Y09B_1'
'12345_sometext_optionaltext_X12A_1'

I need to translate the 'X12' portion to a known numeric value, there are a few different known values (up to 5).

I expect to be able to determine this within one query without needing a subquery. However, the following isn't working for me. The last column is the one throwing the exception. It seems I cannot do the CASE statement using the output of these functions combined for some reason. I've included the proceeding columns for demonstrational purposes only.

select
          regexp_matches(my_col, E'^.*_([^_]*)[A-Z]{1}_\\d*$'), -- returns {'X12'}
         (regexp_matches(my_col, E'^.*_([^_]*)[A-Z]{1}_\\d*$'))[1], -- returns 'X12'
    case (regexp_matches(my_col, E'^.*_([^_]*)[A-Z]{1}_\\d*$'))[1]
        when 'X12' then '1200'
        when 'Y09' then '950'
        else '?' end -- should return '1200' but throws error
from my_table;

Instead I get the error:

ERROR: set-valued function called in context that cannot accept a set
SQL state: 0A000

Can someone advise me?


Solution

  • Given data:

    create table my_table(my_col text);
    insert into my_table(my_col) values
    ('12345_sometext_X12B_1'),
    ('12345_sometext_optionaltext_Y09B_1'),
    ('12345_sometext_optionaltext_X12A_1'),
    ('nomatch');
    

    the above query does produce the error you report. Very odd, since:

    SELECT pg_typeof((regexp_matches(my_col, E'^.*_([^_]*)[A-Z]{1}_\\d*$'))[1]);
    

    returns 'text'. It should really say setof text though, and that's the trap: regex_matches is a set returning function. Those have ... interesting ... behaviour when called outside the FROM clause in PostgreSQL.

    From pattern matching:

    The regexp_matches function returns a text array of all of the captured substrings resulting from matching a POSIX regular expression pattern. It has the syntax regexp_matches(string, pattern [, flags ]). The function can return no rows, one row, or multiple rows

    Try reformulating your query to use a subquery to call the SRF. This'll fail if more than one row is returned by the matcher, though:

    SELECT 
      CASE (SELECT x[1] FROM regexp_matches(my_col, E'^.*_([^_]*)[A-Z]{1}_\\d*$') x)
        WHEN 'X12' THEN '1200'
        WHEN 'Y09' THEN '950'
        ELSE '?'
      END
    FROM my_table;
    

    Want to see how weird SRFs in SELECT are in Pg? Compare the results of these queries:

    SELECT generate_series(1,10), generate_series(1,15);
    

    and:

    SELECT generate_series(1,10), generate_series(1,20);
    

    The 1st produces 30 rows. The 2nd produces 20. Have fun explaining why. More than one SRF in a SELECT list in Pg produces crazy if occasionally useful results.

    PostgreSQL 9.3 supports the SQL-standard LATERAL clause thanks to Tom Lane, which provides a sane and well defined alternative to the current behaviour.