I need to find the most frequent text value in a table but I am having a difficult time wrapping my head around applying the solutions I have found to my dataset.
My table is structured as follows:
Column A | Column B |
---|---|
Val A | Val a |
Val A | Val b |
Val A | Val a |
Val B | Val a |
Val B | Val b |
Val B | Val b |
I have created gotten the basic concept of calculating mode using count and max functions with CTEs but I really can't figure out how to return the table I need.
What I am expecting is a table like:
Column A | Column B |
---|---|
Val A | Val a |
Val B | Val b |
Val a is the most frequent for Val A and b for B.
There are hundreds of unique values in Col A and I'm trying to solve this so that I can then return the 2nd most frequent value in Col B if the most frequent is null.
Thanks for any guidance!
WITH counts AS(
SELECT
ColA,
COUNT(ColB) AS ColB_count
FROM table
GROUP BY ColA
)
SELECT ColA, ColB_count
FROM counts
WHERE ColB_count = (
SELECT MAX(ColB_count)
FROM counts
);
I'd say this is what you are trying to do with your attempted sql. Loosely speaking, you need the counts, and then you need the max counts, and then you need to put these two back together to match the max count with the value (from ColB) that it goes with. But be sure to test with ties and with nulls (if applicable) to ensure you understand how these cases will behave.
with cte1 as (
select
ColA,
ColB,
count(ColB) as CountOfColB
from Table
group by
ColA,
ColB
),
cte2 as (
select
ColA,
Max(CountOfColB) as total
from cte1
group by ColA
)
select
cte1.ColA,
cte1.ColB
--,cte1.CountOfColB
from
cte1
inner join cte2
on cte1.ColA = cte2.ColA
and cte1.CountOfColB = cte2.total
order by
cte1.ColA,
cte1.ColB
This is another approach using a window function with rank in the second cte (again, check your results when/if you have ties or nulls to be sure you are handling these as desired):
with cte1 as (
select
ColA,
ColB,
count(ColB) as CountOfColB
from Table
group by
ColA,
ColB
),
cte2 as (
select
ColA,
ColB,
rank() over (partition by ColA order by CountOfColB) as RankedValue
from cte1
)
select
cte2.ColA,
cte2.ColB
from
cte2
where cte2.RankedValue = 1
order by
cte2.ColA,
cte2.ColB