I want to generate a simulation of a running tally over a period of time given an initial opening balance. By example, an imaginary business starts with $1000 and experiences a random profit and loss during any month. How much might this business have after 12 months? My issue is that I can generate the needed columns, but cannot figure out how to carry the simulated tally on to the next month without clobbering data in the current month.
I can create a Polars dataframe and populate this with the needed profit and loss columns and generate the the closing balance. Using stacked 'with_columns()', I created the needed columns for the opening and closing balances, and then generate some random profit and losses, then update the closing balance. But I cannot get this to carry forward.
import polars as pl
import datetime as dt
from dateutil.relativedelta import relativedelta
from random import normalvariate
start_date = dt.date.today() + relativedelta(months=1, day=1)
df = pl.DataFrame(
pl.date_range(start_date, start_date + relativedelta(months=5), '1mo', eager=True).alias('date'),
)
beginning_balance = 1000.0
df = df.with_columns(
pl.lit(beginning_balance).alias('beginning_balance'),
pl.lit(beginning_balance).alias('closing_balance'),
).with_columns(
pl.when(pl.col('date') == start_date)
.then(pl.col('beginning_balance'))
.otherwise(pl.col('closing_balance').shift(1))
.alias('beginning_balance'),
pl.Series([normalvariate(100, 80) for _ in range(len(df))]).round(2).alias('profit'),
pl.Series([normalvariate(100, 75) for _ in range(len(df))]).round(2).alias('loss'),
).with_columns(
(pl.col('beginning_balance') + pl.col('profit') - pl.col('loss')).alias('closing_balance'),
)
df
resulting in :
shape: (6, 5)
┌────────────┬───────────────────┬─────────────────┬────────┬────────┐
│ date ┆ beginning_balance ┆ closing_balance ┆ profit ┆ loss │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞════════════╪═══════════════════╪═════════════════╪════════╪════════╡
│ 2024-11-01 ┆ 1000.0 ┆ 934.95 ┆ -58.53 ┆ 6.52 │
│ 2024-12-01 ┆ 1000.0 ┆ 903.15 ┆ 69.02 ┆ 165.87 │
│ 2025-01-01 ┆ 1000.0 ┆ 1007.71 ┆ 111.21 ┆ 103.5 │
│ 2025-02-01 ┆ 1000.0 ┆ 1011.97 ┆ 209.43 ┆ 197.46 │
│ 2025-03-01 ┆ 1000.0 ┆ 998.85 ┆ 32.22 ┆ 33.37 │
│ 2025-04-01 ┆ 1000.0 ┆ 1151.32 ┆ 277.49 ┆ 126.17 │
└────────────┴───────────────────┴─────────────────┴────────┴────────┘
Note that the closing balance from the first month (909.61) was not carried to the next month, which remained at 1000.00, thus the running tally is not happening.
I can definitely use a loop (for or while) to walk through each row, but this thwarts the optimization and parallelization of Polars. The following would work, but is much slower (especially when I am running many simulations with much more detailed calculations).
current_tally = beginning_balance
for t in range(len(df)):
beginning_balance = current_tally
current_tally = beginning_balance + df[t, 'profit'] - df[t, 'loss']
df[t, 'beginning_balance'] = beginning_balance
df[t, 'closing_balance'] = current_tally
df
┌────────────┬───────────────────┬─────────────────┬────────┬────────┐
│ date ┆ beginning_balance ┆ closing_balance ┆ profit ┆ loss │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞════════════╪═══════════════════╪═════════════════╪════════╪════════╡
│ 2024-11-01 ┆ 1000.0 ┆ 934.95 ┆ -58.53 ┆ 6.52 │
│ 2024-12-01 ┆ 934.95 ┆ 838.1 ┆ 69.02 ┆ 165.87 │
│ 2025-01-01 ┆ 838.1 ┆ 845.81 ┆ 111.21 ┆ 103.5 │
│ 2025-02-01 ┆ 845.81 ┆ 857.78 ┆ 209.43 ┆ 197.46 │
│ 2025-03-01 ┆ 857.78 ┆ 856.63 ┆ 32.22 ┆ 33.37 │
│ 2025-04-01 ┆ 856.63 ┆ 1007.95 ┆ 277.49 ┆ 126.17 │
└────────────┴───────────────────┴─────────────────┴────────┴────────┘
How would I do this within Polars?
Alternative solution, which I think is more efficient (since .cumulative_eval
is doing a lot of additions multiple times):
Starting dataframe with date
and the random profit
and loss
:
import polars as pl
import datetime as dt
from dateutil.relativedelta import relativedelta
num_months = 6
start_date = dt.date.today() + relativedelta(months=1, day=1)
df = pl.DataFrame().with_columns(
pl.date_range(start_date, start_date + relativedelta(months=num_months - 1), '1mo').alias('date'),
pl.Series([normalvariate(100, 80) for _ in range(num_months)]).round(2).alias('profit'),
pl.Series([normalvariate(100, 75) for _ in range(num_months)]).round(2).alias('loss'),
)
Building the new columns based on columns profit
and loss
:
beginning_balance = 1000.0
df = df.with_columns(
(beginning_balance + (pl.col('profit') - pl.col('loss')).cum_sum())
.alias('closing_balance')
).with_columns(
pl.col('closing_balance').shift(1, fill_value=beginning_balance)
.alias('beginning_balance')
)
The tally
is calculated by using .cum_sum()
on the difference between profit
and loss
.