Consider the following pl.DataFrame
:
df = pl.DataFrame(
data={
"np_linspace_start": [0, 0, 0],
"np_linspace_stop": [8, 6, 7],
"np_linspace_num": [5, 4, 4]
}
)
shape: (3, 3)
┌───────────────────┬──────────────────┬─────────────────┐
│ np_linspace_start ┆ np_linspace_stop ┆ np_linspace_num │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═══════════════════╪══════════════════╪═════════════════╡
│ 0 ┆ 8 ┆ 5 │
│ 0 ┆ 6 ┆ 4 │
│ 0 ┆ 7 ┆ 4 │
└───────────────────┴──────────────────┴─────────────────┘
How can I create a new column ls
, that is the result of the np.linspace
function? This column will hold an np.array
.
I was looking for something along those lines:
df.with_columns(
ls=np.linspace(
start=pl.col("np_linspace_start"),
stop=pl.col("np_linspace_stop"),
num=pl.col("np_linspace_num")
)
)
Is there a polars
equivalent to np.linspace
?
As mentioned in the comments, adding an np.linspace
-style function to polars is an open feature request. Until this is implemented a simple implementation using polars' native expression API could look as follows.
Update. Modern polars supports broadcasting of operations between scalar and list columns. This can be used to shift and scale an integer list column created using pl.int_ranges
and improve on the initial implementation outlined below.
def pl_linspace(start: str | pl.Expr, stop: str | pl.Expr, num: str | pl.Expr) -> pl.Expr:
start = pl.col(start) if isinstance(start, str) else start
stop = pl.col(stop) if isinstance(stop, str) else stop
num = pl.col(num) if isinstance(num, str) else num
grid = pl.int_ranges(num)
_scale = (stop - start) / (num - 1)
_offset = start
return grid * _scale + _offset
df.with_columns(
pl_linspace(
"np_linspace_start",
"np_linspace_stop",
"np_linspace_num",
).alias("pl_linspace")
)
shape: (3, 4)
┌───────────────────┬──────────────────┬─────────────────┬────────────────────────────────┐
│ np_linspace_start ┆ np_linspace_stop ┆ np_linspace_num ┆ pl_linspace │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ list[f64] │
╞═══════════════════╪══════════════════╪═════════════════╪════════════════════════════════╡
│ 0 ┆ 8 ┆ 5 ┆ [0.0, 2.0, 4.0, 6.0, 8.0] │
│ 0 ┆ 6 ┆ 4 ┆ [0.0, 2.0, 4.0, 6.0] │
│ 0 ┆ 7 ┆ 4 ┆ [0.0, 2.333333, 4.666667, 7.0] │
└───────────────────┴──────────────────┴─────────────────┴────────────────────────────────┘
Note. If num
is 1, the division when computing _scale
will result in infinite values. This can be avoided by adding the following to pl_linspace
.
_scale = pl.when(_scale.is_infinite()).then(pl.lit(0)).otherwise(_scale)
Outdated (but relevant for older versions of polars).
First, we use pl.int_range
(thanks to @Dean MacGregor) to create a range of integers from 0 to num
(exclusive). Next, we rescale and shift the range according to start
, stop
, and num
. Finally, we implode the column with pl.Expr.implode
to obtain a column with the range as list for each row.
def pl_linspace(start: pl.Expr, stop: pl.Expr, num: pl.Expr) -> pl.Expr:
grid = pl.int_range(num)
_scale = (stop - start) / (num - 1)
_offset = start
return (grid * _scale + _offset).implode().over(pl.int_range(pl.len()))
df.with_columns(
pl_linspace(
start=pl.col("np_linspace_start"),
stop=pl.col("np_linspace_stop"),
num=pl.col("np_linspace_num"),
).alias("pl_linspace")
)