I have table like
create table vals (k, v, z, y, m) AS VALUES
(0, 0, 1, 8, 0),
(1, 2, 1, 1, 6),
(1, 0, 0, 2, 2),
(5, 4, 6, 8, 9),
(0, 0, 0, 6, 6);
I want to calculate modal value for each row. If there are multiple modal values, return such value, that occurs first.
k | v | z | y | m | modal |
---|---|---|---|---|---|
0 | 0 | 1 | 8 | 0 | 0 |
1 | 2 | 1 | 1 | 6 | 1 |
1 | 0 | 0 | 2 | 2 | 0 |
5 | 4 | 6 | 8 | 9 | 5 |
0 | 0 | 0 | 6 | 6 | 0 |
0
and 2
are equally frequent but 0
occurs first so it wins.5
occurs first so it wins.How can I do it in PostreSQL?
PostgreSQL has a built-in mode()within group(order by e)
function:
mode () WITHIN GROUP ( ORDER BY anyelement ) → anyelement
Computes the mode, the most frequent value of the aggregated argument (arbitrarily choosing the first one if there are multiple equally-frequent values). The aggregated argument must be of a sortable type.
Which matches the Wikipedia definition of a modal value:
Type Description Example Result Arithmetic mean Sum of values of a data set divided by number of values (1+2+2+3+4+7+9) / 7 4 Median Middle value separating the greater and lesser halves of a data set 1, 2, 2, 3, 4, 7, 9 3 Mode Most frequent value in a data set 1, 2, 2, 3, 4, 7, 9 2
The function is an ordered-set aggregate, not a variadic, so you'd have to pretend values in each row are a column. You can pack them into an array[]
, then unnest()
it, then process it and return that, all inside a correlated scalar subquery:
demo at db-fiddle
WITH arrs as(select *,array[k, v, z, y, m] as arr
from vals)
select k, v, z, y, m
,(select mode()within group(order by e)
from unnest(arr) as e) as modal
from arrs;
Or try unpivoting, then re-aggregating:
WITH one_column as(select ctid,k as c from vals
union all
select ctid,v from vals
union all
select ctid,z from vals
union all
select ctid,y from vals
union all
select ctid,m from vals)
select min(vals.k)
,min(vals.v)
,min(vals.z)
,min(vals.y)
,min(vals.m)
,mode()within group(order by c)
from one_column
join vals on one_column.ctid=vals.ctid
group by ctid;
Your updated example breaks ties in a different way than mode()
does:
arbitrarily choosing the first one if there are multiple equally-frequent values
It looks like what you want is for the one earlier in the unordered list to win.
To achieve that behaviour, you can just count(*)
values in each row (packed into array[]
, then unnest()
ed), sort this by placing most frequent first, and break ties by promoting the one that appeared earlier. unnest()with ordinality
adds the information about the original position of the unpacked element.
demo at db-fiddle
WITH arrs as(select *,array[k, v, z, y, m] as arr
from vals)
select k, v, z, y, m
,(select e
from unnest(arr)with ordinality as _(e,ordinality)
group by e
order by count(*) desc
,min(ordinality) asc--earlier in the undordered list wins
--,e desc--larger one wins
limit 1)
from arrs;
To get all modal value candidates, you can use fetch first 1 rows with ties
. This also uses a special array()
constructor that accepts a subquery directly:
demo at db-fiddle
WITH arrs as(select *,array[k, v, z, y, m] as arr
from vals)
select k, v, z, y, m
,array(select unnest(arr)e
group by e
order by count(*) desc
fetch first 1 rows with ties
) as all_modal_value_candidates
from arrs;
In older versions of PostgreSQL, you can emulate that using rank()over()
:
WITH arrs as(select *,array[k, v, z, y, m] as arr
from vals)
select k, v, z, y, m
,(select array_agg(e)
from(select unnest(arr)e, rank()over(order by count(*) desc)
group by e)_
where rank=1
) as all_modal_value_candidates
from arrs;