I want to generate a simulation of a running tally over a period of time given an initial opening balance. By example, an imaginary business starts with $1000 and experiences a random profit and loss during any month. How much might this business have after 12 months? My issue is that I can generate the needed columns, but cannot figure out how to carry the simulated tally on to the next month without clobbering data in the current month.
I can create a Polars dataframe and populate this with the needed profit and loss columns and generate the the closing balance. Using stacked 'with_columns()', I created the needed columns for the opening and closing balances, and then generate some random profit and losses, then update the closing balance. But I cannot get this to carry forward.
import polars as pl
import datetime as dt
from dateutil.relativedelta import relativedelta
start_date = dt.date.today() + relativedelta(months=1, day=1)
df = pl.DataFrame(
pl.date_range(start_date, start_date + relativedelta(months=5), '1mo').alias('date'),
)
beginning_balance = 1000.0
df = df.with_columns(
pl.lit(beginning_balance).alias('beginning_balance'),
pl.lit(beginning_balance).alias('closing_balance'),
).with_columns(
[
pl.when(pl.col('date') == start_date)
.then(pl.col('beginning_balance'))
.otherwise(pl.col('closing_balance').shift(1))
.alias('beginning_balance'),
pl.Series([normalvariate(100, 80) for _ in range(len(df))]).round(2).alias('profit'),
pl.Series([normalvariate(100, 75) for _ in range(len(df))]).round(2).alias('loss'),
]
).with_columns(
(pl.col('beginning_balance') + pl.col('profit') - pl.col('loss')).alias('closing_balance'),
)
df
resulting in :
shape: (6, 5)
date beginning_balance closing_balance profit loss
date f64 f64 f64 f64
2023-04-01 1000.0 909.61 72.97 163.36
2023-05-01 1000.0 1072.29 212.84 140.55
2023-06-01 1000.0 974.54 82.75 108.21
2023-07-01 1000.0 824.08 -60.69 115.23
2023-08-01 1000.0 940.71 155.6 214.89
2023-09-01 1000.0 994.5 94.23 99.73
Note that the closing balance from the first month (909.61) was not carried to the next month, which remained at 1000.00, thus the running tally is not happening.
I can definitely use a loop (for or while) to walk through each row, but this thwarts the optimization and parallelization of Polars. The following would work, but is much slower (especially when I am running many simulations with much more detailed calculations).
current_tally = beginning_balance
for t in range(len(df)):
beginning_balance = current_tally
current_tally = beginning_balance + df[t, 'profit'] - df[t, 'loss']
df[t, 'beginning_balance'] = beginning_balance
df[t, 'closing_balance'] = current_tally
df
shape: (6, 5)
date beginning_balance closing_balance profit loss
date f64 f64 f64 f64
2023-04-01 1000.0 1063.13 115.49 52.36
2023-05-01 1063.13 1224.37 184.62 23.38
2023-06-01 1224.37 1258.02 114.68 81.03
2023-07-01 1258.02 1323.73 239.12 173.41
2023-08-01 1323.73 1341.33 102.18 84.58
2023-09-01 1341.33 1329.88 19.9 31.35
How would I do this within Polars?
Alternative solution, which I think is more efficient (since .cumulative_eval
is doing a lot of additions multiple times):
Starting dataframe with date
and the random profit
and loss
:
import polars as pl
import datetime as dt
from dateutil.relativedelta import relativedelta
num_months = 6
start_date = dt.date.today() + relativedelta(months=1, day=1)
df = pl.DataFrame().with_columns([
pl.date_range(start_date, start_date + relativedelta(months=num_months - 1), '1mo').alias('date'),
pl.Series([normalvariate(100, 80) for _ in range(num_months)]).round(2).alias('profit'),
pl.Series([normalvariate(100, 75) for _ in range(num_months)]).round(2).alias('loss'),
])
Building the new columns based on columns profit
and loss
:
beginning_balance = 1000.0
df = df.with_columns(
(beginning_balance + (pl.col('profit') - pl.col('loss')).cumsum())
.alias('closing_balance')
).with_columns(
pl.col('closing_balance').shift_and_fill(1, beginning_balance)
.alias('beginning_balance')
)
The tally
is calculated by using .cumsum()
on the difference between profit
and loss
.