How to compute a column in polars dataframe using np.linspace

Consider the following pl.DataFrame:

df = pl.DataFrame(
    data={
        "np_linspace_start": [0, 0, 0], 
        "np_linspace_stop": [8, 6, 7],
        "np_linspace_num": [5, 4, 4]
    }
)

shape: (3, 3)
┌───────────────────┬──────────────────┬─────────────────┐
│ np_linspace_start ┆ np_linspace_stop ┆ np_linspace_num │
│ ---               ┆ ---              ┆ ---             │
│ i64               ┆ i64              ┆ i64             │
╞═══════════════════╪══════════════════╪═════════════════╡
│ 0                 ┆ 8                ┆ 5               │
│ 0                 ┆ 6                ┆ 4               │
│ 0                 ┆ 7                ┆ 4               │
└───────────────────┴──────────────────┴─────────────────┘

How can I create a new column ls, that is the result of the np.linspace function? This column will hold an np.array.

I was looking for something along those lines:

df.with_columns(
    ls=np.linspace(
        start=pl.col("np_linspace_start"),
        stop=pl.col("np_linspace_stop"),
        num=pl.col("np_linspace_num")
    )
)

Is there a polars equivalent to np.linspace?

Solution

Update (February 2025). The release of Polars 1.22 added pl.linear_spaces to the API, providing native support for the generation of evenly-spaced values.

df.with_columns(
    pl.linear_spaces(
        "np_linspace_start",
        "np_linspace_stop",
        "np_linspace_num",
    ).alias("linear_spaces")
)

shape: (3, 4)
┌───────────────────┬──────────────────┬─────────────────┬────────────────────────────────┐
│ np_linspace_start ┆ np_linspace_stop ┆ np_linspace_num ┆ linear_spaces                  │
│ ---               ┆ ---              ┆ ---             ┆ ---                            │
│ i64               ┆ i64              ┆ i64             ┆ list[f64]                      │
╞═══════════════════╪══════════════════╪═════════════════╪════════════════════════════════╡
│ 0                 ┆ 8                ┆ 5               ┆ [0.0, 2.0, 4.0, 6.0, 8.0]      │
│ 0                 ┆ 6                ┆ 4               ┆ [0.0, 2.0, 4.0, 6.0]           │
│ 0                 ┆ 7                ┆ 4               ┆ [0.0, 2.333333, 4.666667, 7.0] │
└───────────────────┴──────────────────┴─────────────────┴────────────────────────────────┘

Outdated. Before polars 1.22, adding an np.linspace-style function was an open feature request. Still, it was possible to write an implementation using polars' native expression API.

Relevant for Polars Version 1.10.0 to 1.21.0.

Modern polars supports broadcasting of operations between scalar and list columns. This can be used to shift and scale an integer list column created using pl.int_ranges and improve on the initial implementation outlined below.

def pl_linspace(start: str | pl.Expr, stop: str | pl.Expr, num: str | pl.Expr) -> pl.Expr:
    start = pl.col(start) if isinstance(start, str) else start
    stop = pl.col(stop) if isinstance(stop, str) else stop
    num = pl.col(num) if isinstance(num, str) else num

    grid = pl.int_ranges(num)
    _scale = (stop - start) / (num - 1)
    _offset = start
    return grid * _scale + _offset

df.with_columns(
    pl_linspace(
        "np_linspace_start",
        "np_linspace_stop",
        "np_linspace_num",
    ).alias("pl_linspace")
)

Note. If num is 1, the division when computing _scale will result in infinite values. This can be avoided by adding the following to pl_linspace.

_scale = pl.when(_scale.is_infinite()).then(pl.lit(0)).otherwise(_scale)

Relevant for Polars Version 1.9.0 and below.

First, we use pl.int_range (thanks to @Dean MacGregor) to create a range of integers from 0 to num (exclusive). Next, we rescale and shift the range according to start, stop, and num. Finally, we implode the column with pl.Expr.implode to obtain a column with the range as list for each row.

def pl_linspace(start: pl.Expr, stop: pl.Expr, num: pl.Expr) -> pl.Expr:
    grid = pl.int_range(num)
    _scale = (stop - start) / (num - 1)
    _offset = start
    return (grid * _scale + _offset).implode().over(pl.int_range(pl.len()))

df.with_columns(
    pl_linspace(
        start=pl.col("np_linspace_start"),
        stop=pl.col("np_linspace_stop"),
        num=pl.col("np_linspace_num"),
    ).alias("pl_linspace")
)