pythonpandaschurn

How to find churned customers on a monthly basis? Python Pandas


I have a large customer dataset, it has things like Customer ID, Service ID, Product, etc. So the two ways we can measure churn are at a Customer-ID level, if the entire customer leaves and at a Service-ID level where maybe they cancel 2 out of 5 services.

The data looks like this, and as we can see

I am trying to write some code that creates the 'Churn' column.. I have tried

Thank you!

data = {'CustomerName': ['Alligators','Aunties', 'Bricks', 'Bricks','Bricks', 'Bricks', 'Bricks', 'Bricks', 'Bricks', 'Bricks'], 
        'ServiceID': [1009, 1008, 1001, 1002, 1003, 1004, 1001, 1002, 1001, 1002], 
        'Product': ['Apples', 'Apples', 'Apples', 'Bananas', 'Oranges', 'Watermelon', 'Apples', 'Bananas', 'Apples', 'Bananas'], 
        'Month': ['Jan', 'Jan', 'Jan', 'Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'], 
        'Year': [2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021],
        'Churn': ['CustomerChurn', 'CustomerChurn', 'ServiceContinue', 'ServiceContinue', 'ServiceChurn', 'ServiceChurn','ServiceContinue', 'ServiceContinue', 'NA', 'NA']}
df = pd.DataFrame(data)
df

enter image description here


Solution

  • I think this gets close to what you want, except for the NA in the last two rows, but if you really need those NA, then you can filter by date and change the values.

    Because you are really testing two different groupings, I send the first customername grouping through a function and depending what I see, I send a more refined grouping through a second function. For this data set it seems to work.

    I create an actual date column and make sure everything is sorted before grouping. The logic inside the functions is testing the max date of the group to see if it's less than a certain date. Looks like you are testing March as the current month

    You should be able to adapt it for your needs

    df['testdate'] = df.apply(lambda x: datetime.datetime.strptime('-'.join((x['Month'], str(x['Year']))),'%b-%Y'), axis=1)
    df = df.sort_values('testdate')
    df1 = df.drop('Churn',axis=1)
    
    def get_customerchurn(x, tdate):
        # print(x)
        # print(tdate)
        if x.testdate.max() < tdate:
            x.loc[:, 'Churn'] = 'CustomerChurn'
            return x
        else:
            x = x.groupby(['CustomerName', 'Product']).apply(lambda x: get_servicechurn(x, datetime.datetime(2021,3,1)))
            return x
    
    def get_servicechurn(x, tdate):
        print(x)
        # print(tdate)
        if x.testdate.max() < tdate:
            x.loc[:, 'Churn'] = 'ServiceChurn'
            return x
        else:
            x.loc[:, 'Churn'] = 'ServiceContinue'
            return x
    
    df2 = df1.groupby(['CustomerName']).apply(lambda x: get_customerchurn(x, datetime.datetime(2021,3,1)))
    df2
    

    Output:

      CustomerName  ServiceID     Product Month  Year   testdate            Churn
    0   Alligators       1009      Apples   Jan  2021 2021-01-01    CustomerChurn
    1      Aunties       1008      Apples   Jan  2021 2021-01-01    CustomerChurn
    2       Bricks       1001      Apples   Jan  2021 2021-01-01  ServiceContinue
    3       Bricks       1002     Bananas   Jan  2021 2021-01-01  ServiceContinue
    4       Bricks       1003     Oranges   Jan  2021 2021-01-01     ServiceChurn
    5       Bricks       1004  Watermelon   Jan  2021 2021-01-01     ServiceChurn
    6       Bricks       1001      Apples   Feb  2021 2021-02-01  ServiceContinue
    7       Bricks       1002     Bananas   Feb  2021 2021-02-01  ServiceContinue
    8       Bricks       1001      Apples   Mar  2021 2021-03-01  ServiceContinue
    9       Bricks       1002     Bananas   Mar  2021 2021-03-01  ServiceContinue