I have a NumPy array
true_direction = np.array([1,2,3]).reshape(1,3)
which I want to insert into a Polars DataFrame; that is, repeat this array in every row of the DataFrame.
Below are what I have tried currently
.with_column()
:
.with_columns(
pl.Series(
np.repeat(true_direction, repeats=912, axis=0)
).alias('true_direction')
)
The problem would be I have to somehow get the shape of the DataFrame beforehand,
which is kind of annoying.true_direction = [1,2,3]
in which case I can use pl.lit()
(suggested by ChatGpt)
.with_columns(
pl.lit(true_direction)
# .cast(pl.Array(pl.Float64, 3))
.alias('true_direction')
)
The problem here is then I'd have to manually convert the list[f64]
column into an
array[f64,3]
column since I need to take a dot product later on.Is there a more Polaric way to do this?
With polars.lit
, Polars will broadcast the literal to the height of the DataFrame for you. In this you also need to add .first()
to let it know your numpy array is a scalar to be broadcasted.
You mentioned floats, but have an array of ints. The type of the array in Polars will match the type of the input in NumPy, as shown below.
true_direction = np.array([1, 2, 3]).reshape(1, 3)
true_direction_float = np.array([1., 2., 3.]).reshape(1, 3)
df = pl.DataFrame({"a": range(10)})
df.with_columns(
true_direction=pl.lit(true_direction).first(),
true_direction_float=pl.lit(true_direction_float).first(),
)
outputs
shape: (10, 3)
┌─────┬────────────────┬──────────────────────┐
│ a ┆ true_direction ┆ true_direction_float │
│ --- ┆ --- ┆ --- │
│ i64 ┆ array[i32, 3] ┆ array[f64, 3] │
╞═════╪════════════════╪══════════════════════╡
│ 0 ┆ [1, 2, 3] ┆ [1.0, 2.0, 3.0] │
│ 1 ┆ [1, 2, 3] ┆ [1.0, 2.0, 3.0] │
│ 2 ┆ [1, 2, 3] ┆ [1.0, 2.0, 3.0] │
│ 3 ┆ [1, 2, 3] ┆ [1.0, 2.0, 3.0] │
│ 4 ┆ [1, 2, 3] ┆ [1.0, 2.0, 3.0] │
│ 5 ┆ [1, 2, 3] ┆ [1.0, 2.0, 3.0] │
│ 6 ┆ [1, 2, 3] ┆ [1.0, 2.0, 3.0] │
│ 7 ┆ [1, 2, 3] ┆ [1.0, 2.0, 3.0] │
│ 8 ┆ [1, 2, 3] ┆ [1.0, 2.0, 3.0] │
│ 9 ┆ [1, 2, 3] ┆ [1.0, 2.0, 3.0] │
└─────┴────────────────┴──────────────────────┘
If you want to change from int to float, you would need to cast (either in NumPy or Polars). Maybe if the other input to your dot product is a float, Polars will cast the result as a float (float being the supertype). Not sure on that one, test it out.