pythonpython-3.xpandasdataframepython-datetime

Using the Python math.prod() in a specified Pandas date range


I'm relatively new to Python but I have a complicated issue which I think requires a few different functions, but I don't know where to start.

It's kind of straightforward to do in Excel, by using the PRODUCT function, based on a starting point up to the most recent renewal date (in this example it would be 01/10/2020) and simply multiplying this Product function totals of each of the starting point elements, to give the current renewal date total, broken down by element values.

I want to write this in Python but besides the very basic info on Python 3.8 math.prod(), there doesn't seem to be anything more that I can find online to resolve/assist me with this.

I have the following 3 Pandas dataframes (I copied the data from Excel and pasted in the links below for ease of the explanation, but I've imported these as dataframes in my test code).

The data comes in this form which need to be collated together and a revaluation, broken down by the different elements of a member's entitlement, needs to go from the starting date ("StartDate") to the current “renewal” date (01/10/2020) - if possible, there also needs to be a pro-rata done in complete months between the StartDate to the 1st renewal date (equivalent to the Excel YEARFRAC function).

The 3 dataframes are:

  1. Identifiers of members with certain basic information. The most important piece of data is the ‘StartDate’ column, as it's the point to revalue member benefits from. It needs to have a function that calculates, I'm guessing here, the TimeDelta (in months) from the StartDate to the member's 1st renewal date, which needs to be a pro-rata of the difference e.g. 1st member started on 01/02/2016 so the 1st renewal needs to have a pro-rata of 8 months. I'd also like to set a boolean where if True, the pro-rata occurs otherwise it doesn't

List of MemberIDs and StartDates

  1. Increase factors. The elements will either increase or stay level (never decrease in value)

Increase rates by year (index used is the Renewal Date)

  1. Member assets - linked to (1) by the member identifier (MemberID). These different elements should escalate by the increase factors in (2) and these figures are the elements as at the StartDate.

StartDate elements

Essentially, the function(s) need to calculate when the first renewal date is after the StartDate, apply a pro-rata increase if the boolean is True, then work out the range of renewal dates and increase factors between the StartDate and the latest renewal date and finally apply the Product for this date range on the StartDate elements.

With this being my first ever post, my apologies if it's not quite in the format you might expect or with any Python code as I'm very new to Python and to StackOverflow (they wouldn't even allow me to post the tables directly, only as links to images). The data I've provided, due to data protection, are dummy dates but relatively based on the actual data.

Any questions you have, I'm happy to provide more information. Thanks for any help in advance!

EDIT: Please see this temporary solution that I'm using, but it's without the pro-rata of the 1st increase between StartDate to the 1st increase date: Solution (without pro-rata)


Solution

  • I've whipped something up here that should give you a primer. The first code block just recreates your data. The second block iterates over each member, gets the appropriate renewal dates, and then just multiplies the fetched rows across the DataFrames.

    It's not as efficient as it could be due to using the explicit for-loop over member ids instead of broadcasting, but it should get you started.

    import pandas as pd
    
    df_startDate = pd.DataFrame({
        'StartDate': pd.to_datetime(['01/02/2016', '10/04/2017', '29/09/2018', '05/11/2018']),
    }, index=['9000001', '9000023', '9004561', '9007910'])
    
    df_renewals = pd.DataFrame({
        'Element 1': [1.05, 1.04, 1.06, 1.10, 1.08, 1.06],
        'Element 2': [1.03, 1.02, 1.07, 1.05, 1.03, 1.02],
        'Element 3': [1.04, 1.04, 1.05, 1.03, 1.02, 1.05],
    }, index=pd.to_datetime(['01/10/'+str(i) for i in range(2015, 2021)]))
    
    df_assets = pd.DataFrame({
        'Element 1': [1000, 1500, 2000, 1750],
        'Element 2': [1500, 2000, 2500, 2000],
        'Element 3': [2000, 2500, 3000, 2250],
    }, index=['9000001', '9000023', '9004561', '9007910'])
    
    print(df_startDate)
    print(df_renewals)
    print(df_assets)
    

    My approach that you can study:

    results = pd.DataFrame(columns=['Element 1', 'Element 2', 'Element 3'])
    
    for member_id in df_startDate.index:
        print('******')
        print(member_id)
        
        # Get rows between the StartDate and the current date
        df_factors = df_renewals.loc[df_startDate['StartDate'].loc[member_id]: pd.Timestamp.now()]
        print(df_factors, end='\n\n')
        
        # Multiply rows together to get total factor
        prod_factors = df_factors.product(axis='index')
        print(prod_factors.to_frame().T, end='\n\n')
        
        # Multiply factor with base value
        results.loc[member_id] = df_assets.loc[member_id].mul(prod_factors)
        print(results.loc[member_id].to_frame().T, end='\n\n')
    
    print(results)
    

    Without comments/printing:

    results = pd.DataFrame(columns=['Element 1', 'Element 2', 'Element 3'])
    for member_id in df_startDate.index:
        df_factors = df_renewals.loc[df_startDate['StartDate'].loc[member_id]: pd.Timestamp.now()]
        results.loc[member_id] = df_assets.loc[member_id].mul( df_factors.product(axis='index') )
    

    Result:

               Element 1    Element 2   Element 3
    9000001  1388.230272  1805.934123  2409.23592
    9000023  1888.920000  2206.260000  2757.82500
    9004561  2289.600000  2626.500000  3213.00000
    9007910  2003.400000  2101.200000  2409.75000