sqlpostgresqlplpgsqlwindow-functionsplpython

select a specific row for each group according to a custom logic


I have the following table:

create table  test_table (
    id serial primary key,
    store text,
    price int,
    id_2 text);
insert into test_table (store, id_2, price) values
('a', '11', 100),
('b', '22', 120),
('c', '33', 150),
('d', '33', 120),
('b', '11', 90),
('d', '44', 240),
('a', '22', 80),
('b', '55', 140),
('c', '11', 150) returning *;

select * from test_table tt group by id_2, id order by id_2;

I want to create a view/sub-table where for each group by id_2 (grouped_rows), I pick one row according to a custom logic, something like:

if 'a' in grouped_rows.store:
  select row where a is present
else if 'c' in grouped_rows.store:
  select row where c is present
...

As I understand, window functions work with the values from a partition (same grouping as the previous group-by)

select store, id_2, count(id_2)  over (partition by id_2) as counts
from test_table tt  order by counts desc;

So can I use a custom window function to apply the previous logic, or are any other way to achieve this? until now, i haven't found a way to use a window function to achieve something like this.

As an extra, is it possible to build the selected row, for example, adding price, where the price would be the min() of the group (which doesn't have to be from the same row selected by store, but it's from the same grouped_rows).

if 'a' in grouped_rows.store:
  select row where 'a' is present, select min(grouped_rows.price)
else if 'c' in grouped_rows.store:
  select row where 'c' is present, select min(grouped_rows.price)
...

Solution

  • You may try to use first_value() window function with proper custom order by in the window definition. Yet using distinct on with custom ordering solves your problem easier and better. You can still use min() for the price.

    select distinct on (id_2) * 
    from test_table 
    order by id_2, 
      case store when 'a' then 1 when 'c' then 2 end nulls last;
    

    Please note the second (case) expression in the order by clause. It is modeling the rule "If there is a row with store = 'a' in the same id_2 group of rows then take it, else if there is a row with store = 'c' in it then take it, else take any row of the group".

    id store price id_2
    1 a 100 11
    7 a 80 22
    3 c 150 33
    6 d 240 44
    8 b 140 55

    With min_price extra:

    select distinct on (id_2) *,
      min(price) over (partition by id_2) as min_price 
    from test_table 
    order by id_2,
      case store when 'a' then 1 when 'c' then 2 end nulls last;
    
    id store price id_2 min_price
    1 a 100 11 90
    7 a 80 22 80
    3 c 150 33 120
    6 d 240 44 240
    8 b 140 55 140