sqlsnowflake-cloud-data-platform

Creating a condition based on an aggregated rank within a CASE WHEN expression


I have a query that aggregates a couple of client sales figures by summing them by groupings of status, region and class categories. The query fixes some issues with the client names using CASE WHEN expressions, preferring the column big_name when available, med_name in big_name is not available, and small_name when both big_name and med_name are not available. Please see below.

SELECT

month_key,
status,
region,
class,

CASE
   WHEN big_name IN ('UNKNOWN', '-', '') AND med_name IN ('UNKNOWN', '-', '') THEN small_name
   WHEN big_name IN ('UNKNOWN', '-', '') AND med_name NOT IN ('UNKNOWN', '-', '') THEN med_name
   WHEN big_name NOT IN ('UNKNOWN', '-', '') THEN big_name
END AS new_name,

SUM(os.net_sls_amnt) AS net_sales,
SUM(os.gross_sls_amnt) AS gross_sales,
SUM(os.dscnt_amnt) AS discount,

FROM SCHEMA.ORIGINAL.SOURCE As os

WHERE month_key >= 202001

GROUP BY month_key, status, region, class, new_name
ORDER BY month_key, status, region, class, new_name

The source data deals with millions of clients. I'm not suppose to aggregate the sales figures for all clients at all. I'm suppose to differentiate the top 300 clients, which we do care for individually, from the clients not in the top 300 rank. Thus, I need to somehow rank the clients by net_sales while at the same time use the logic that fixes their name (as in the new_name in CASE expression). Then, I have to combine the clients not in the top 300 rank as one single 'SMALL CLIENT', while still aggregate the top 300 ranked clients individually by new_name. The ranking of top clients is not suppose to differentiate their status, region and class, I'm only suppose to care for fixed client name when ranking them, which adds to the difficulty.

Finally, I don't have editing permissions for this table, and given how massive it is, I can't copy it over either.

Is there anyway to do combine all small merchants under a single name? Thank you so much for your time and for your help!

I'm including here an EXAMPLE with mock up data for 10 clients, and the desired output if we were to care only about the top 3 clients and combined the others under the name 'SMALL CLIENT'.

Any thoughts? I super appreciate any help at all!


Solution

  • There are different ways to do this with unions and by grouping sets. It's probably just easiest to aggregate a second time after computing the ranks and collapsing everything past the cutoff into a single bucket.

    Presumably if there are ties that overflow past 300 you would still want to see the individual breakout (or note ties in the earlier rankings.) You'll use rank() as below, rather than row_number(), to accommodate that requirement.

    The sample data left me unclear about the proper order of coalescing the name values but that's a trivial adjustment.

    There's also a minor assumption that no legitimate companies will be called NO NAME or SMALL CLIENT. That is easy to tweak if necessary.

    https://dbfiddle.uk/QB9BUF6C

    WITH mydata AS (
        SELECT *,
            COALESCE(
                CASE WHEN big_name   NOT IN ('UNKNOWN', '-', '') THEN big_name END,
                CASE WHEN med_name   NOT IN ('UNKNOWN', '-', '') THEN med_name END,
                CASE WHEN small_name NOT IN ('UNKNOWN', '-', '') THEN small_name END,
                'NO NAME'
            ) AS new_name
        FROM SCHEMA.ORIGINAL.SOURCE
    ), aggregated AS (
        SELECT
            new_name, month_key, status, region, class,
            SUM(net_sls_amnt) AS net_sales,
            SUM(gross_sls_amnt) AS gross_sales,
            SUM(dscnt_amnt) AS discount,
            SUM(SUM(net_sls_amnt)) OVER (PARTITION BY new_name) as combined_sales
        FROM mydata
        WHERE month_key >= 202001
        GROUP BY new_name, month_key, status, region, class
    ), ranked AS (
        SELECT *, RANK() OVER (ORDER BY combined_sales) as rnk
        FROM aggregated
    )
    select
        min(rnk) as ranking,
        case when rnk < 301 then new_name else 'SMALL CLIENT' end as name,
        month_key, status, region, class,
        sum(net_sales) as net_sales,
        sum(gross_sales) as gross_sales,
        sum(discount) as discount
    from ranked
    group by
        case when rnk < 301 then new_name else 'SMALL CLIENT' end,
        month_key, status, region, class
    order by ranking, name, month_key, status, region, class;