I'm relatively new to Python but I have a complicated issue which I think requires a few different functions, but I don't know where to start.
It's kind of straightforward to do in Excel, by using the PRODUCT function, based on a starting point up to the most recent renewal date (in this example it would be 01/10/2020) and simply multiplying this Product function totals of each of the starting point elements, to give the current renewal date total, broken down by element values.
I want to write this in Python but besides the very basic info on Python 3.8 math.prod(), there doesn't seem to be anything more that I can find online to resolve/assist me with this.
I have the following 3 Pandas dataframes (I copied the data from Excel and pasted in the links below for ease of the explanation, but I've imported these as dataframes in my test code).
The data comes in this form which need to be collated together and a revaluation, broken down by the different elements of a member's entitlement, needs to go from the starting date ("StartDate") to the current “renewal” date (01/10/2020) - if possible, there also needs to be a pro-rata done in complete months between the StartDate to the 1st renewal date (equivalent to the Excel YEARFRAC function).
The 3 dataframes are:
List of MemberIDs and StartDates
Increase rates by year (index used is the Renewal Date)
Essentially, the function(s) need to calculate when the first renewal date is after the StartDate, apply a pro-rata increase if the boolean is True, then work out the range of renewal dates and increase factors between the StartDate and the latest renewal date and finally apply the Product for this date range on the StartDate elements.
With this being my first ever post, my apologies if it's not quite in the format you might expect or with any Python code as I'm very new to Python and to StackOverflow (they wouldn't even allow me to post the tables directly, only as links to images). The data I've provided, due to data protection, are dummy dates but relatively based on the actual data.
Any questions you have, I'm happy to provide more information. Thanks for any help in advance!
EDIT: Please see this temporary solution that I'm using, but it's without the pro-rata of the 1st increase between StartDate to the 1st increase date: Solution (without pro-rata)
I've whipped something up here that should give you a primer. The first code block just recreates your data. The second block iterates over each member, gets the appropriate renewal dates, and then just multiplies the fetched rows across the DataFrames.
It's not as efficient as it could be due to using the explicit for-loop over member ids instead of broadcasting, but it should get you started.
import pandas as pd
df_startDate = pd.DataFrame({
'StartDate': pd.to_datetime(['01/02/2016', '10/04/2017', '29/09/2018', '05/11/2018']),
}, index=['9000001', '9000023', '9004561', '9007910'])
df_renewals = pd.DataFrame({
'Element 1': [1.05, 1.04, 1.06, 1.10, 1.08, 1.06],
'Element 2': [1.03, 1.02, 1.07, 1.05, 1.03, 1.02],
'Element 3': [1.04, 1.04, 1.05, 1.03, 1.02, 1.05],
}, index=pd.to_datetime(['01/10/'+str(i) for i in range(2015, 2021)]))
df_assets = pd.DataFrame({
'Element 1': [1000, 1500, 2000, 1750],
'Element 2': [1500, 2000, 2500, 2000],
'Element 3': [2000, 2500, 3000, 2250],
}, index=['9000001', '9000023', '9004561', '9007910'])
print(df_startDate)
print(df_renewals)
print(df_assets)
My approach that you can study:
results = pd.DataFrame(columns=['Element 1', 'Element 2', 'Element 3'])
for member_id in df_startDate.index:
print('******')
print(member_id)
# Get rows between the StartDate and the current date
df_factors = df_renewals.loc[df_startDate['StartDate'].loc[member_id]: pd.Timestamp.now()]
print(df_factors, end='\n\n')
# Multiply rows together to get total factor
prod_factors = df_factors.product(axis='index')
print(prod_factors.to_frame().T, end='\n\n')
# Multiply factor with base value
results.loc[member_id] = df_assets.loc[member_id].mul(prod_factors)
print(results.loc[member_id].to_frame().T, end='\n\n')
print(results)
Without comments/printing:
results = pd.DataFrame(columns=['Element 1', 'Element 2', 'Element 3'])
for member_id in df_startDate.index:
df_factors = df_renewals.loc[df_startDate['StartDate'].loc[member_id]: pd.Timestamp.now()]
results.loc[member_id] = df_assets.loc[member_id].mul( df_factors.product(axis='index') )
Result:
Element 1 Element 2 Element 3
9000001 1388.230272 1805.934123 2409.23592
9000023 1888.920000 2206.260000 2757.82500
9004561 2289.600000 2626.500000 3213.00000
9007910 2003.400000 2101.200000 2409.75000