I'd like to perform the following:
Input:
df = pl.DataFrame({
"a": [1,15,None,20,None]
})
Output:
df = pl.DataFrame({
"a": [1,15,None,20,None],
"b": [0,14,None,5,None]
})
That is, from:
A |
---|
1 |
15 |
None |
20 |
None |
to:
A | B |
---|---|
1 | 0 |
15 | 14 |
None | None |
20 | 5 |
None | None |
So, what it does:
I'd like to perform this in python's polars dataframe library, but I can't seem to find a solution.
I've tried the following question:
But unfortunately, this does not answer the original problem, since the question performs an aggregation of an entire column, and then takes the last value of that column.
What I'd like to do is not to aggregate an entire column, but simply to subtract a current value with a previous non-null value.
I have also tried to use rolling:
df = df.with_row_index().rolling(
index_column = 'index',
period = '???i').agg(pl.col("A").last())
But, of course, that does not work because the occurence of Null Values cannot be determined (i.e. it is not periodic, so I don't know how many indexes before the current entry contains a non-null value in "A").
Does anyone knows how to do so?
Thanks!
You can use a combination of shift
and forward_fill
to get the last non-null value.
So with your input, this looks like
df = pl.DataFrame({
"a": [1, 15, None, 20, None]
})
df.with_columns(
# current row value for "a" minus the last non-null value
# as the first row has no previous non-null value,
# fill it with the first value (1)
(
pl.col("a") -
pl.col("a").shift(fill_value=pl.col("a").first()).forward_fill()
).alias("b")
)
# shape: (5, 2)
# ┌──────┬──────┐
# │ a ┆ b │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞══════╪══════╡
# │ 1 ┆ 0 │
# │ 15 ┆ 14 │
# │ null ┆ null │
# │ 20 ┆ 5 │
# │ null ┆ null │
# └──────┴──────┘