import polars as pl
url = "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"
df = pl.read_csv(url)
change = (
df.with_columns(pl.col("sepal_width").shift(1, fill_value=0).pct_change(1).alias("pct_1"))
.with_columns(
pl.when(pl.col("pct_1").is_infinite())
.then(float(0))
.otherwise(pl.col("pct_1"))
.fill_null(float(0))
.name.keep(),
pl.col("pct_1").cum_sum().alias("cumsum_pct_1")
)
)
Given a dataset, I want to calculate cumsum after using pct_change. But the results are inf even after fill. Have searched for a while. Someone helps me, please.
Now:
┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┬───────────┬──────────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species ┆ pct_1 ┆ cumsum_pct_1 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str ┆ f64 ┆ f64 │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╪═══════════╪══════════════╡
│ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ setosa ┆ 0.0 ┆ null │
│ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ setosa ┆ 0.0 ┆ inf │
│ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ setosa ┆ -0.142857 ┆ inf │
│ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ setosa ┆ 0.066667 ┆ inf │
Expected:
┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┬───────────┬──────────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species ┆ pct_1 ┆ cumsum_pct_1 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str ┆ f64 ┆ f64 │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╪═══════════╪══════════════╡
│ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ setosa ┆ 0.0 ┆ 0.0 │
│ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ setosa ┆ 0.0 ┆ 0.0 │
│ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ setosa ┆ -0.142857 ┆ -0.142857 │
│ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ setosa ┆ 0.066667 ┆ -0.076190 │
I think the problem you have is a small misunderstanding how expression work inside a context ("with_columns"). All expression inside a context run in parallel so changes of one column aren't visible to the other expressions. So to solve your problem you have to do it step wise
change = (
df.with_columns(
(pl.col("sepal_width").shift(1, fill_value=0).pct_change(1).alias("pct_1"))
)
.with_columns(
pl.when(pl.col("pct_1").is_infinite())
.then(float(0))
.otherwise(pl.col("pct_1"))
.fill_null(float(0))
.name.keep()
)
.with_columns(pl.col("pct_1").cum_sum().alias("cumsum_pct_1"))
)
shape: (150, 7)
┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┬───────────┬──────────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species ┆ pct_1 ┆ cumsum_pct_1 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str ┆ f64 ┆ f64 │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╪═══════════╪══════════════╡
│ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ setosa ┆ 0.0 ┆ 0.0 │
│ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ setosa ┆ 0.0 ┆ 0.0 │
│ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ setosa ┆ -0.142857 ┆ -0.142857 │
│ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ setosa ┆ 0.066667 ┆ -0.07619 │
│ 5.0 ┆ 3.6 ┆ 1.4 ┆ 0.2 ┆ setosa ┆ -0.03125 ┆ -0.10744 │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 6.7 ┆ 3.0 ┆ 5.2 ┆ 2.3 ┆ virginica ┆ 0.03125 ┆ 1.690664 │
│ 6.3 ┆ 2.5 ┆ 5.0 ┆ 1.9 ┆ virginica ┆ -0.090909 ┆ 1.599755 │
│ 6.5 ┆ 3.0 ┆ 5.2 ┆ 2.0 ┆ virginica ┆ -0.166667 ┆ 1.433088 │
│ 6.2 ┆ 3.4 ┆ 5.4 ┆ 2.3 ┆ virginica ┆ 0.2 ┆ 1.633088 │
│ 5.9 ┆ 3.0 ┆ 5.1 ┆ 1.8 ┆ virginica ┆ 0.133333 ┆ 1.766421 │
└──────────────┴─────────────┴──────────────┴─────────────┴───────────┴───────────┴──────────────┘
see also this question