In pandas, I can interpolate based on a datetimes like this:
df1 = pd.DataFrame(
{
"ts": [
datetime(2020, 1, 1),
datetime(2020, 1, 3, 0, 0, 12),
datetime(2020, 1, 3, 0, 1, 35),
datetime(2020, 1, 4),
],
"value": [1, np.nan, np.nan, 3],
}
)
df1.set_index('ts').interpolate(method='index')
Outputs:
value
ts
2020-01-01 00:00:00 1.000000
2020-01-03 00:00:12 2.333426
2020-01-03 00:01:35 2.334066
2020-01-04 00:00:00 3.000000
Is there a similar method in polars? Say, starting with
df1 = pl.DataFrame(
{
"ts": [
datetime(2020, 1, 1),
datetime(2020, 1, 3, 0, 0, 12),
datetime(2020, 1, 3, 0, 1, 35),
datetime(2020, 1, 4),
],
"value": [1, None, None, 3],
}
)
shape: (4, 2)
┌─────────────────────┬───────┐
│ ts ┆ value │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪═══════╡
│ 2020-01-01 00:00:00 ┆ 1 │
│ 2020-01-03 00:00:12 ┆ null │
│ 2020-01-03 00:01:35 ┆ null │
│ 2020-01-04 00:00:00 ┆ 3 │
└─────────────────────┴───────┘
EDIT: I've updated the example to make it a bit more "irregular", so that upsample
can't be used as a solution and to make it clear that we need something more generic
Update: Expr.interpolate_by
was added in Polars 0.20.28
df1.with_columns(pl.col("value").interpolate_by("ts"))
shape: (4, 2)
┌─────────────────────┬──────────┐
│ ts ┆ value │
│ --- ┆ --- │
│ datetime[μs] ┆ f64 │
╞═════════════════════╪══════════╡
│ 2020-01-01 00:00:00 ┆ 1.0 │
│ 2020-01-03 00:00:12 ┆ 2.333426 │
│ 2020-01-03 00:01:35 ┆ 2.334066 │
│ 2020-01-04 00:00:00 ┆ 3.0 │
└─────────────────────┴──────────┘
Not sure how useful this is but it looks like pandas calls np.interp()
to do this:
invalid = pl.when(pl.col('value').is_null()).then(pl.col('ts')).alias('invalid')
valid = pl.when(pl.col('value').is_not_null()).then(pl.col('ts')).alias('valid')
values = pl.when(pl.col('value').is_not_null()).then(pl.col('value')).alias('values')
df.select(
pl.struct(invalid, valid, values)
.map(lambda args:
np.interp(
args.struct['invalid'].drop_nulls().dt.timestamp().to_numpy(zero_copy_only=True),
args.struct['valid'].drop_nulls().dt.timestamp().to_numpy(zero_copy_only=True),
args.struct['values'].drop_nulls().to_numpy(zero_copy_only=True)
)
)
.flatten()
)
shape: (2, 1)
┌──────────┐
│ invalid │
│ --- │
│ f64 │
╞══════════╡
│ 2.333426 │
│ 2.334066 │
└──────────┘
Although there does seem to be a lot of other stuff going on.