I'm trying to do a slightly complicated string conversion in PostgreSQL 9.0.1. The values in my_col
are long strings in the format of:
'12345_sometext_X12B_1'
'12345_sometext_optionaltext_Y09B_1'
'12345_sometext_optionaltext_X12A_1'
I need to translate the 'X12' portion to a known numeric value, there are a few different known values (up to 5).
I expect to be able to determine this within one query without needing a subquery. However, the following isn't working for me. The last column is the one throwing the exception. It seems I cannot do the CASE
statement using the output of these functions combined for some reason. I've included the proceeding columns for demonstrational purposes only.
select
regexp_matches(my_col, E'^.*_([^_]*)[A-Z]{1}_\\d*$'), -- returns {'X12'}
(regexp_matches(my_col, E'^.*_([^_]*)[A-Z]{1}_\\d*$'))[1], -- returns 'X12'
case (regexp_matches(my_col, E'^.*_([^_]*)[A-Z]{1}_\\d*$'))[1]
when 'X12' then '1200'
when 'Y09' then '950'
else '?' end -- should return '1200' but throws error
from my_table;
Instead I get the error:
ERROR: set-valued function called in context that cannot accept a set
SQL state: 0A000
Can someone advise me?
Given data:
create table my_table(my_col text);
insert into my_table(my_col) values
('12345_sometext_X12B_1'),
('12345_sometext_optionaltext_Y09B_1'),
('12345_sometext_optionaltext_X12A_1'),
('nomatch');
the above query does produce the error you report. Very odd, since:
SELECT pg_typeof((regexp_matches(my_col, E'^.*_([^_]*)[A-Z]{1}_\\d*$'))[1]);
returns 'text'. It should really say setof text
though, and that's the trap: regex_matches
is a set returning function. Those have ... interesting ... behaviour when called outside the FROM clause in PostgreSQL.
From pattern matching:
The regexp_matches function returns a text array of all of the captured substrings resulting from matching a POSIX regular expression pattern. It has the syntax regexp_matches(string, pattern [, flags ]). The function can return no rows, one row, or multiple rows
Try reformulating your query to use a subquery to call the SRF. This'll fail if more than one row is returned by the matcher, though:
SELECT
CASE (SELECT x[1] FROM regexp_matches(my_col, E'^.*_([^_]*)[A-Z]{1}_\\d*$') x)
WHEN 'X12' THEN '1200'
WHEN 'Y09' THEN '950'
ELSE '?'
END
FROM my_table;
Want to see how weird SRFs in SELECT are in Pg? Compare the results of these queries:
SELECT generate_series(1,10), generate_series(1,15);
and:
SELECT generate_series(1,10), generate_series(1,20);
The 1st produces 30 rows. The 2nd produces 20. Have fun explaining why. More than one SRF in a SELECT list in Pg produces crazy if occasionally useful results.
PostgreSQL 9.3 supports the SQL-standard LATERAL
clause thanks to Tom Lane, which provides a sane and well defined alternative to the current behaviour.