pythonconcatenationappendpython-polars

How can I append or concatenate two dataframes in python polars?


I see it's possible to append using the series namespace (https://stackoverflow.com/a/70599059/5363883). What I'm wondering is if there is a similar method for appending or concatenating DataFrames.

In pandas historically it could be done with df1.append(df2). However that method is being deprecated (if it hasn't already been deprecated) for pd.concat([df1, df2]).

Sample frames:

df1 = pl.from_repr("""
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 2   ┆ 3   │
└─────┴─────┴─────┘
""")


df2 = pl.from_repr("""
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 4   ┆ 5   ┆ 6   │
└─────┴─────┴─────┘
""")

Desired result:

shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 2   ┆ 3   │
│ 4   ┆ 5   ┆ 6   │
└─────┴─────┴─────┘

Solution

  • There are different append strategies depending on your needs.

    df1 = pl.DataFrame({"a": [1], "b": [2], "c": [3]})
    df2 = pl.DataFrame({"a": [4], "b": [5], "c": [6]})
    
    
    # new memory slab
    new_df = pl.concat([df1, df2], rechunk=True)
    
    # append free (no memory copy)
    new_df = df1.vstack(df2)
    
    # try to append in place
    df1.extend(df2)
    

    To understand the differences, it is important to understand polars memory is immutable iff it has any copy.

    Copies in polars are free, because it only increments a reference count of the backing memory buffer instead of copying the data itself.

    However, if a memory buffer has no copies yet, e.g. the refcount == 1, we can mutate polars memory.

    Knowing this background there are the following ways to append data: