Consider the following pl.DataFrame
:
df = pl.DataFrame(
data={
"np_linspace_start": [0, 0, 0],
"np_linspace_stop": [8, 6, 7],
"np_linspace_num": [5, 4, 4]
}
)
shape: (3, 3)
┌───────────────────┬──────────────────┬─────────────────┐
│ np_linspace_start ┆ np_linspace_stop ┆ np_linspace_num │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═══════════════════╪══════════════════╪═════════════════╡
│ 0 ┆ 8 ┆ 5 │
│ 0 ┆ 6 ┆ 4 │
│ 0 ┆ 7 ┆ 4 │
└───────────────────┴──────────────────┴─────────────────┘
How can I create a new column ls
, that is the result of the np.linspace
function? This column will hold an np.array
.
I was looking for something along those lines:
df.with_columns(
ls=np.linspace(
start=pl.col("np_linspace_start"),
stop=pl.col("np_linspace_stop"),
num=pl.col("np_linspace_num")
)
)
Is there a polars
equivalent to np.linspace
?
Update (February 2025). The release of Polars 1.22 added pl.linear_spaces
to the API, providing native support for the generation of evenly-spaced values.
df.with_columns(
pl.linear_spaces(
"np_linspace_start",
"np_linspace_stop",
"np_linspace_num",
).alias("linear_spaces")
)
shape: (3, 4)
┌───────────────────┬──────────────────┬─────────────────┬────────────────────────────────┐
│ np_linspace_start ┆ np_linspace_stop ┆ np_linspace_num ┆ linear_spaces │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ list[f64] │
╞═══════════════════╪══════════════════╪═════════════════╪════════════════════════════════╡
│ 0 ┆ 8 ┆ 5 ┆ [0.0, 2.0, 4.0, 6.0, 8.0] │
│ 0 ┆ 6 ┆ 4 ┆ [0.0, 2.0, 4.0, 6.0] │
│ 0 ┆ 7 ┆ 4 ┆ [0.0, 2.333333, 4.666667, 7.0] │
└───────────────────┴──────────────────┴─────────────────┴────────────────────────────────┘
Outdated. Before polars 1.22, adding an np.linspace
-style function was an open feature request. Still, it was possible write an implementation using polars' native expression API.
Relevant for Polars Version 1.10.0 to 1.21.0.
Modern polars supports broadcasting of operations between scalar and list columns. This can be used to shift and scale an integer list column created using pl.int_ranges
and improve on the initial implementation outlined below.
def pl_linspace(start: str | pl.Expr, stop: str | pl.Expr, num: str | pl.Expr) -> pl.Expr:
start = pl.col(start) if isinstance(start, str) else start
stop = pl.col(stop) if isinstance(stop, str) else stop
num = pl.col(num) if isinstance(num, str) else num
grid = pl.int_ranges(num)
_scale = (stop - start) / (num - 1)
_offset = start
return grid * _scale + _offset
df.with_columns(
pl_linspace(
"np_linspace_start",
"np_linspace_stop",
"np_linspace_num",
).alias("pl_linspace")
)
Note. If num
is 1, the division when computing _scale
will result in infinite values. This can be avoided by adding the following to pl_linspace
.
_scale = pl.when(_scale.is_infinite()).then(pl.lit(0)).otherwise(_scale)
Relevant for Polars Version 1.9.0 and below.
First, we use pl.int_range
(thanks to @Dean MacGregor) to create a range of integers from 0 to num
(exclusive). Next, we rescale and shift the range according to start
, stop
, and num
. Finally, we implode the column with pl.Expr.implode
to obtain a column with the range as list for each row.
def pl_linspace(start: pl.Expr, stop: pl.Expr, num: pl.Expr) -> pl.Expr:
grid = pl.int_range(num)
_scale = (stop - start) / (num - 1)
_offset = start
return (grid * _scale + _offset).implode().over(pl.int_range(pl.len()))
df.with_columns(
pl_linspace(
start=pl.col("np_linspace_start"),
stop=pl.col("np_linspace_stop"),
num=pl.col("np_linspace_num"),
).alias("pl_linspace")
)