I have an Oracle Table that contains data similar to the following basic example:
+--------+----------+
| SERIES | CATEGORY |
+--------+----------+
| green | apple |
| green | pear |
| green | pear |
| yellow | apple |
| yellow | apple |
| yellow | pear |
| yellow | pear |
| yellow | pear |
| yellow | banana |
| yellow | banana |
| yellow | banana |
| red | apple |
+--------+----------+
I would like to generate a Pareto-like Graph of this data that should look as like Stacked Pareto Chart,
To create this graph I would like to run a SQL query and get the following output:
+----------+--------+-------+
| CATEGORY | SERIES | COUNT |
+----------+--------+-------+
| pear | green | 2 |
| pear | yellow | 3 |
| apple | green | 1 |
| apple | yellow | 2 |
| apple | red | 1 |
| banana | yellow | 3 |
+----------+--------+-------+
The actual table has millions of entries and it currently takes a significant amount of time to query the database as the current procedure I am using is not very efficient:
Order the categories by the amount of entries in each category:
SELECT CATEGORY, COUNT(CATEGORY) FROM FRUIT GROUP BY CATEGORY ORDER BY COUNT(CATEGORY);
Then for each category I list the relevant series in order of the series:
SELECT SERIES, COUNT(SERIES) FROM FRUIT WHERE CATEGORY = [current category] GROUP BY SERIES ORDER BY SERIES;
What would be the most efficient way to query the database (Preferably a single SQL statement) in order to get the desired output?
Some shorter version:
select category, series, CntS
from (
select distinct count(category) over (partition by category) cntC,
count(series) over (partition by category, series ) cntS,
category, series
from fruit ) Tab
order by CntC desc, cntS desc;