Got the following query going.
SELECT
customer_id,
NTILE(5) OVER (ORDER BY MAX(oms_order_date)) AS r_score
FROM
mdwh.us_raw.l_dmw_order_report
WHERE
quantity_ordered > 0
AND customer_id IS NOT NULL
AND customer_id != ('')
AND UPPER(line_status) NOT IN ('','RETURN', 'CANCELLED')
AND UPPER(item_description_1) NOT IN ('','FREIGHT', 'RETURN LABEL FEE', 'VISIBLE STITCH')
AND (quantity_ordered * unit_price_amount) > 0
AND extended_amount < 1000 --NO BULK ORDERS
AND oms_order_date BETWEEN '2020-01-01' AND '2020-01-01'
AND SUBSTRING(upc,1,6) IN (SELECT item_code FROM item_master_zs WHERE new_division BETWEEN '11' AND '39')
GROUP BY
customer_id
ORDER BY
customer_id
All I'm doing here is, given some conditions, to give me the unique customer ID, then cluster their latest purchase date into quintiles and provide me with a score in the second column. But everytime I run the query, the r_score value keeps changing? What am I doing wrong..? Here's a snippet of what the table looks like (again, r_score value keeps changing):
The problem with ntile()
is that it ensures that the groups are exactly the same size by putting the same value across different groups.
For this reason, I usually do the calculation manually, using rank()
:
ceil(rank() over (order by max(oms_order_date)) * 5.0 /
count(*) over ()
) as r_score
If you use row_number()
, you'll get the equivalent of ntile()
.
If you want to use ntile()
, you can use additional order by
keys so the sorting keys are unique.
===================
Here is the new code I'm using:
SELECT
customer_id,
CEIL(RANK() OVER (ORDER BY MAX(oms_order_date)) * 5 / COUNT(*) OVER ()) AS r_score,
CEIL(RANK() OVER (ORDER BY COUNT(client_web_order_number)) * 5 / COUNT(*) OVER ()) AS f_score,
CEIL(RANK() OVER (ORDER BY AVG(extended_amount)) * 5 / COUNT(*) OVER ()) AS m_score,
(r_score || f_score || m_score) AS rfm_score
FROM
mdwh.us_raw.l_dmw_order_report t1
WHERE
quantity_ordered > 0
AND customer_id IS NOT NULL
AND customer_id != ('')
AND oms_order_date IS NOT NULL
AND UPPER(line_status) NOT IN ('','RETURN', 'CANCELLED')
AND UPPER(item_description_1) NOT IN ('','FREIGHT', 'RETURN LABEL FEE', 'VISIBLE STITCH')
AND (quantity_ordered * unit_price_amount) > 0
AND extended_amount < 1000 --NO BULK ORDERS
AND oms_order_date BETWEEN '2020-01-01' AND '2020-01-10'
AND SUBSTRING(upc,1,6) IN (SELECT item_code FROM item_master_zs WHERE new_division BETWEEN '11' AND '39')
GROUP BY
customer_id
ORDER BY
customer_id
The issue now is that I'm getting some row's with a blank r_score, and the max value is 4 instead of 5..