python-polarspolars

Want to broadcast a NumPy array using `pl.lit()` in Polars


Goal

I have a NumPy array

true_direction = np.array([1,2,3]).reshape(1,3)

which I want to insert into a Polars DataFrame; that is, repeat this array in every row of the DataFrame.

What I have tried

Below are what I have tried currently

  1. Repeat the numpy array and use .with_column():
    .with_columns(
       pl.Series(
         np.repeat(true_direction, repeats=912, axis=0)
       ).alias('true_direction')
    )
    
    The problem would be I have to somehow get the shape of the DataFrame beforehand, which is kind of annoying.
  2. Another way is to not start out with a numpy array
    true_direction = [1,2,3]
    
    in which case I can use pl.lit() (suggested by ChatGpt)
     .with_columns(
       pl.lit(true_direction)
       # .cast(pl.Array(pl.Float64, 3))
       .alias('true_direction')
     )
    
    The problem here is then I'd have to manually convert the list[f64] column into an array[f64,3] column since I need to take a dot product later on.

My question

Is there a more Polaric way to do this?


Solution

  • With polars.lit, Polars will broadcast the literal to the height of the DataFrame for you. In this you also need to add .first() to let it know your numpy array is a scalar to be broadcasted.

    You mentioned floats, but have an array of ints. The type of the array in Polars will match the type of the input in NumPy, as shown below.

    true_direction = np.array([1, 2, 3]).reshape(1, 3)
    true_direction_float = np.array([1., 2., 3.]).reshape(1, 3)
    
    df = pl.DataFrame({"a": range(10)})
    
    df.with_columns(
        true_direction=pl.lit(true_direction).first(),
        true_direction_float=pl.lit(true_direction_float).first(),
    )
    

    outputs

    shape: (10, 3)
    ┌─────┬────────────────┬──────────────────────┐
    │ a   ┆ true_direction ┆ true_direction_float │
    │ --- ┆ ---            ┆ ---                  │
    │ i64 ┆ array[i32, 3]  ┆ array[f64, 3]        │
    ╞═════╪════════════════╪══════════════════════╡
    │ 0   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
    │ 1   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
    │ 2   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
    │ 3   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
    │ 4   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
    │ 5   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
    │ 6   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
    │ 7   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
    │ 8   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
    │ 9   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
    └─────┴────────────────┴──────────────────────┘
    

    If you want to change from int to float, you would need to cast (either in NumPy or Polars). Maybe if the other input to your dot product is a float, Polars will cast the result as a float (float being the supertype). Not sure on that one, test it out.