I have this code in pandas:
df[col] = (
df[col]
.fillna(method="ffill", limit=1)
.apply(lambda x: my_function(x))
)
I want to re-write this in Polars.
I have tried this:
df = df.with_columns(
pl.col(col)
.fill_null(strategy="forward", limit=1)
.map_elements(lambda x: my_function(x))
)
It does not work properly. It fills with forward strategy but ignores filling missing values with my defined function. What should I change in my code to get what I want?
try this code:
df_polars = pl.DataFrame(
{"A": [1, 2, None, None, None, None, 4, None], "B": [5, None, None, None, None, 7, None, 9]}
)
df_pandas = pd.DataFrame(
{"A": [1, 2, None, None, None, None, 4, None], "B": [5, None, None, None, None, 7, None, 9]}
)
last_valid_data: int
def my_function(x):
global last_valid_data
if x == None or np.isnan(x):
result = last_valid_data * 10
else:
last_valid_data = x
result = x
return result
col = "A"
last_valid_data = df_pandas[col][0]
df_pandas[col] = df_pandas[col].fillna(method="ffill", limit=1).apply(lambda x: my_function(x))
last_valid_data = df_polars[col][0]
df_polars = df_polars.with_columns(
pl.col(col).fill_null(strategy="forward", limit=1).map_elements(lambda x: my_function(x))
)
Desired output in pandas is:
A B
0 1.0 5.0
1 2.0 NaN
2 2.0 NaN
3 20.0 NaN
4 20.0 NaN
5 20.0 7.0
6 4.0 NaN
7 4.0 9.0
What I get in Polars is:
┌──────┬──────┐
│ A ┆ B │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞══════╪══════╡
│ 1 ┆ 5 │
│ 2 ┆ null │
│ 2 ┆ null │
│ null ┆ null │
│ null ┆ null │
│ null ┆ 7 │
│ 4 ┆ null │
│ 4 ┆ 9 │
└──────┴──────┘
The issue here is that in Polars .map_elements
defaults to skip_nulls=True
df_polars.with_columns(
pl.col('A').map_elements(lambda me: print(f'{me=}'))
)
me=1
me=2
me=4
As your example specifically needs to target the nulls, you need to change this to False
df_polars.with_columns(
pl.col('A').map_elements(lambda me: print(f'{me=}'), skip_nulls=False)
)
me=1
me=2
me=None
me=None
me=None
me=None
me=4
me=None