Take this example:
df = (polars
.DataFrame(dict(
j=polars.datetime_range(datetime.date(2023, 1, 1), datetime.date(2023, 1, 3), '8h', closed='left', eager=True),
))
.with_columns(
k=polars.lit(numpy.random.randint(10, 99, 6)),
)
)
j k
2023-01-01 00:00:00 47
2023-01-01 08:00:00 22
2023-01-01 16:00:00 82
2023-01-02 00:00:00 19
2023-01-02 08:00:00 85
2023-01-02 16:00:00 15
shape: (6, 2)
Here, numpy.random.randint(10, 99, 6)
uses hard-coded 6
as the height of DataFrame, so it won't work if I changed e.g. the interval from 8h
to 4h
(which would require changing 6
to 12
).
I know I can do it by breaking the chain:
df = polars.DataFrame(dict(
j=polars.datetime_range(datetime.date(2023, 1, 1), datetime.date(2023, 1, 3), '4h', closed='left', eager=True),
))
df = df.with_columns(
k=polars.lit(numpy.random.randint(10, 99, df.height)),
)
j k
2023-01-01 00:00:00 47
2023-01-01 04:00:00 22
2023-01-01 08:00:00 82
2023-01-01 12:00:00 19
2023-01-01 16:00:00 85
2023-01-01 20:00:00 15
2023-01-02 00:00:00 89
2023-01-02 04:00:00 74
2023-01-02 08:00:00 26
2023-01-02 12:00:00 11
2023-01-02 16:00:00 86
2023-01-02 20:00:00 81
shape: (12, 2)
Is there a way to do it (i.e. reference df.height
or an equivalent) in one chained expression though?
You can use .pipe()
df = (
pl.datetime_range(
datetime.date(2023, 1, 1),
datetime.date(2023, 1, 3),
"4h",
closed="left",
eager=True
)
.alias("date")
.to_frame()
)
df.pipe(lambda df:
df.with_columns(pl.lit(np.random.randint(10, 99, df.height)).alias("rand"))
)
shape: (12, 2)
┌─────────────────────┬──────┐
│ date ┆ rand │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪══════╡
│ 2023-01-01 00:00:00 ┆ 39 │
│ 2023-01-01 04:00:00 ┆ 45 │
│ 2023-01-01 08:00:00 ┆ 95 │
│ 2023-01-01 12:00:00 ┆ 72 │
│ … ┆ … │
│ 2023-01-02 08:00:00 ┆ 34 │
│ 2023-01-02 12:00:00 ┆ 42 │
│ 2023-01-02 16:00:00 ┆ 30 │
│ 2023-01-02 20:00:00 ┆ 83 │
└─────────────────────┴──────┘
As for the example task, perhaps .sample()
could be used.
df.with_columns(
pl.int_range(10, 100).sample(pl.len(), with_replacement=True).alias("rand")
)
shape: (12, 2)
┌─────────────────────┬──────┐
│ date ┆ rand │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪══════╡
│ 2023-01-01 00:00:00 ┆ 25 │
│ 2023-01-01 04:00:00 ┆ 27 │
│ 2023-01-01 08:00:00 ┆ 68 │
│ 2023-01-01 12:00:00 ┆ 95 │
│ 2023-01-01 16:00:00 ┆ 96 │
│ … ┆ … │
│ 2023-01-02 04:00:00 ┆ 36 │
│ 2023-01-02 08:00:00 ┆ 25 │
│ 2023-01-02 12:00:00 ┆ 90 │
│ 2023-01-02 16:00:00 ┆ 92 │
│ 2023-01-02 20:00:00 ┆ 92 │
└─────────────────────┴──────┘