I have a large customer dataset, it has things like Customer ID, Service ID, Product, etc. So the two ways we can measure churn are at a Customer-ID level, if the entire customer leaves and at a Service-ID level where maybe they cancel 2 out of 5 services.
The data looks like this, and as we can see
I am trying to write some code that creates the 'Churn' column.. I have tried
Thank you!
data = {'CustomerName': ['Alligators','Aunties', 'Bricks', 'Bricks','Bricks', 'Bricks', 'Bricks', 'Bricks', 'Bricks', 'Bricks'],
'ServiceID': [1009, 1008, 1001, 1002, 1003, 1004, 1001, 1002, 1001, 1002],
'Product': ['Apples', 'Apples', 'Apples', 'Bananas', 'Oranges', 'Watermelon', 'Apples', 'Bananas', 'Apples', 'Bananas'],
'Month': ['Jan', 'Jan', 'Jan', 'Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'],
'Year': [2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021],
'Churn': ['CustomerChurn', 'CustomerChurn', 'ServiceContinue', 'ServiceContinue', 'ServiceChurn', 'ServiceChurn','ServiceContinue', 'ServiceContinue', 'NA', 'NA']}
df = pd.DataFrame(data)
df
I think this gets close to what you want, except for the NA in the last two rows, but if you really need those NA, then you can filter by date and change the values.
Because you are really testing two different groupings, I send the first customername grouping through a function and depending what I see, I send a more refined grouping through a second function. For this data set it seems to work.
I create an actual date column and make sure everything is sorted before grouping. The logic inside the functions is testing the max date of the group to see if it's less than a certain date. Looks like you are testing March as the current month
You should be able to adapt it for your needs
df['testdate'] = df.apply(lambda x: datetime.datetime.strptime('-'.join((x['Month'], str(x['Year']))),'%b-%Y'), axis=1)
df = df.sort_values('testdate')
df1 = df.drop('Churn',axis=1)
def get_customerchurn(x, tdate):
# print(x)
# print(tdate)
if x.testdate.max() < tdate:
x.loc[:, 'Churn'] = 'CustomerChurn'
return x
else:
x = x.groupby(['CustomerName', 'Product']).apply(lambda x: get_servicechurn(x, datetime.datetime(2021,3,1)))
return x
def get_servicechurn(x, tdate):
print(x)
# print(tdate)
if x.testdate.max() < tdate:
x.loc[:, 'Churn'] = 'ServiceChurn'
return x
else:
x.loc[:, 'Churn'] = 'ServiceContinue'
return x
df2 = df1.groupby(['CustomerName']).apply(lambda x: get_customerchurn(x, datetime.datetime(2021,3,1)))
df2
Output:
CustomerName ServiceID Product Month Year testdate Churn
0 Alligators 1009 Apples Jan 2021 2021-01-01 CustomerChurn
1 Aunties 1008 Apples Jan 2021 2021-01-01 CustomerChurn
2 Bricks 1001 Apples Jan 2021 2021-01-01 ServiceContinue
3 Bricks 1002 Bananas Jan 2021 2021-01-01 ServiceContinue
4 Bricks 1003 Oranges Jan 2021 2021-01-01 ServiceChurn
5 Bricks 1004 Watermelon Jan 2021 2021-01-01 ServiceChurn
6 Bricks 1001 Apples Feb 2021 2021-02-01 ServiceContinue
7 Bricks 1002 Bananas Feb 2021 2021-02-01 ServiceContinue
8 Bricks 1001 Apples Mar 2021 2021-03-01 ServiceContinue
9 Bricks 1002 Bananas Mar 2021 2021-03-01 ServiceContinue