I have a data set of all customers purchases made in the last 5 years with the below columns as an example.
CUSTOMER ID | ORDER NUMBER | ORDER DATE (DateTime) | ITEM ID
What I need to do is assign each individual customer order position (i.e 1,2,3,4) - So for each customer I am able to know which order was their 1st 2nd or 3rd using the above criteria
Things to consider:
Struggling to find a starting point of how to do this
Below is the data table:
import pandas as pd
df = pd.DataFrame({'Customer ID' : ['C100003','C100002','C100002','C100002','C100003','C100002'],
'Order Number' : ['RJSJ0053','RJSJ0060','RJSJ0085','RJSJ0085','RJSJ0089','RJSJ0092'],
'Order Date' : ['2023-05-08','2023-06-09','2023-06-13','2023-06-13','2023-06-13','2023-06-14'],
'Item ID' : ['Mouse','Keyboard','Computer','Monitor','Keyboard','Headset'],
'Order Position' : [1,1,2,2,2,3]})
First, remove all duplicate [Customer ID, Order Number]
. This will get you a table that looks like this:
Customer ID Order Number Order Date Item ID
0 C100003 RJSJ0053 2023-05-08 Mouse
1 C100002 RJSJ0060 2023-06-09 Keyboard
2 C100002 RJSJ0085 2023-06-13 Computer
4 C100003 RJSJ0089 2023-06-13 Keyboard
5 C100002 RJSJ0092 2023-06-14 Headset
Now we can sort this by [Customer ID, Order Date]
, create a cumulative count grouped by Customer ID
, then fill forward the missing values since those are multiple items in the same order.
df['Order Position'] = (df.drop_duplicates(['Customer ID', 'Order Number'])
.sort_values(['Customer ID', 'Order Date'])
.groupby('Customer ID')
.cumcount()+1
)
df['Order Position'] = (df.sort_values(['Customer ID', 'Order Date'])['Order Position']
.fillna(method='ffill')
.astype(int)
)
Customer ID Order Number Order Date Item ID Order Position
0 C100003 RJSJ0053 2023-05-08 Mouse 1
1 C100002 RJSJ0060 2023-06-09 Keyboard 1
2 C100002 RJSJ0085 2023-06-13 Computer 2
3 C100002 RJSJ0085 2023-06-13 Monitor 2
4 C100003 RJSJ0089 2023-06-13 Keyboard 2
5 C100002 RJSJ0092 2023-06-14 Headset 3