concatenationappendpython-polars

How can I concat polars dataframes that have different columns


In pandas it happens automatically, just by calling pd.concat([df1, df2, df3]) and the frame that didn't have the column previously just gets a column filled with NaNs.

In polars I get a 'shape error' with the message that the columns differ (11 cols in df1 vs 12 cols in df2).


Solution

  • Polars cares about schema correctness by default in operations and prefers throwing an error above silently succeeding as it might indicate a bug in your program.

    If you want polars to add the columns, add the kwarg how="diagonal" to pl.concat

    df_a = pl.DataFrame({
        "a": [1, 2, 3],
        "b": [True, None, False],
    })
    
    
    df_b = pl.DataFrame({
        "a": [4, 5],
        "c": ["bar", "ham"]
    })
    
    
    pl.concat([df_a, df_b], how="diagonal")
    
    shape: (5, 3)
    ┌─────┬───────┬──────┐
    │ a   ┆ b     ┆ c    │
    │ --- ┆ ---   ┆ ---  │
    │ i64 ┆ bool  ┆ str  │
    ╞═════╪═══════╪══════╡
    │ 1   ┆ true  ┆ null │
    │ 2   ┆ null  ┆ null │
    │ 3   ┆ false ┆ null │
    │ 4   ┆ null  ┆ bar  │
    │ 5   ┆ null  ┆ ham  │
    └─────┴───────┴──────┘