sqloracle-databasegroup-bypareto-chart

Oracle SQL Query for Stacked Pareto Chart


I have an Oracle Table that contains data similar to the following basic example:

+--------+----------+
| SERIES | CATEGORY |
+--------+----------+
| green  | apple    |
| green  | pear     |
| green  | pear     |
| yellow | apple    |
| yellow | apple    |
| yellow | pear     |
| yellow | pear     |
| yellow | pear     |
| yellow | banana   |
| yellow | banana   |
| yellow | banana   |
| red    | apple    |
+--------+----------+

I would like to generate a Pareto-like Graph of this data that should look as like Stacked Pareto Chart,

enter image description here

To create this graph I would like to run a SQL query and get the following output:

+----------+--------+-------+
| CATEGORY | SERIES | COUNT |
+----------+--------+-------+
| pear     | green  |     2 |
| pear     | yellow |     3 |
| apple    | green  |     1 |
| apple    | yellow |     2 |
| apple    | red    |     1 |
| banana   | yellow |     3 |
+----------+--------+-------+

The actual table has millions of entries and it currently takes a significant amount of time to query the database as the current procedure I am using is not very efficient:

Order the categories by the amount of entries in each category:

SELECT CATEGORY, COUNT(CATEGORY) FROM FRUIT GROUP BY CATEGORY ORDER BY COUNT(CATEGORY);

Then for each category I list the relevant series in order of the series:

SELECT SERIES, COUNT(SERIES) FROM FRUIT WHERE CATEGORY = [current category] GROUP BY SERIES ORDER BY SERIES;

What would be the most efficient way to query the database (Preferably a single SQL statement) in order to get the desired output?


Solution

  • Some shorter version:

    select category, series, CntS 
    from (
      select  distinct count(category) over (partition by category) cntC,
       count(series)  over (partition by category, series ) cntS,
       category, series 
    from fruit   ) Tab
    order by CntC desc, cntS desc;