sqlgroup-byamazon-redshifttableau-apiaginity

How to group data by the month and year


Before I get into the issue, here's a 2 second background: I've been working on this RFM analysis, and thanks to our peers, was finally able to output an RFM score for each customer_id in my data set, along with each of their individual R, F, and M scores. Here it is, if you're curious or would like to use it for yourself:

SELECT *,
    SUBSTRING(rfm_combined,1,1) AS recency_score,
    SUBSTRING(rfm_combined,2,1) AS frequency_score,
    SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (

SELECT
    customer_id,
    rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
    (SELECT
    customer_id,
    ntile(5) over (order by last_order_date) AS rfm_recency,
    ntile(5) over (order by count_order) AS rfm_frequency,
    ntile(5) over (order by total_spent) AS rfm_monetary
FROM
    (SELECT
    customer_id,
    MAX(oms_order_date) AS last_order_date,
    COUNT(*) AS count_order,
    SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM 
    l_dmw_order_report
WHERE
    order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
    AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
    AND line_status NOT IN ('CANCELLED', 'HOLD')
    AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'

GROUP BY customer_id))

ORDER BY customer_id desc)

Here's an image: enter image description here

Now, my issue is that I need to keep my output in this kind of format, but to group the data by the Month and Year as well. I initially had grouped this data by customer_id, because I want the RFM and the individual scores to only appear by unique customer_id, but now I need it by the Month+Year and the customer_id (i.e. first column would be Jan 2018, then list all the unique customer_id rows for that month/year combo. Then Feb 2018, and so on). Anyone have any suggestions?

Thank you very much and let me know if you have any questions!!

Best, Z


Solution

  • If you want to group by year-month and customer_id, in that order, change your GROUP BY:

    SELECT *,
        SUBSTRING(rfm_combined,1,1) AS recency_score,
        SUBSTRING(rfm_combined,2,1) AS frequency_score,
        SUBSTRING(rfm_combined,3,1) AS monetary_score
    FROM (
    
    SELECT
        YearMonth,
        customer_id,
        rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
    FROM
        (SELECT
        YearMonth,
        customer_id,
        ntile(5) over (order by last_order_date) AS rfm_recency,
        ntile(5) over (order by count_order) AS rfm_frequency,
        ntile(5) over (order by total_spent) AS rfm_monetary
    FROM
        (SELECT
        to_char(oms_order_date, 'YYYY-MM') AS YearMonth,
        customer_id,
        MAX(oms_order_date) AS last_order_date,
        COUNT(*) AS count_order,
        SUM(quantity_ordered * unit_price_amount) AS total_spent
    FROM 
        l_dmw_order_report
    WHERE
        order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
        AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
        AND line_status NOT IN ('CANCELLED', 'HOLD')
        AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'
    
    GROUP BY to_char(oms_order_date, 'YYYY-MM'), customer_id))
    ORDER BY YearMonth, customer_id desc)