How do I keep both columns as the result of a Polars join with left_on
and right_on
, like in SQL?
df1 = pl.DataFrame({"a": ["x","y","z"], "lk": [1,2,3]})
df2 = pl.DataFrame({"b": ["x","y","z"], "rk": [1,2,3]})
SQL analogy:
SELECT *
FROM tbl1
INNER JOIN tbl2
ON tbl1.lk = tbl2.rk
The usual result (currently undocumented) only keeps the left column. If I do a .select("rk")
afterwards, that column is already gone.
>>> print(df1.join(df2, how="inner", left_on="lk", right_on="rk"))
shape: (3, 3)
┌─────┬─────┬─────┐
│ a ┆ lk ┆ b │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ x ┆ 1 ┆ x │
│ y ┆ 2 ┆ y │
│ z ┆ 3 ┆ z │
└─────┴─────┴─────┘
Use coalesce=False
.
>>> print(df1.join(df2, how="inner", left_on="lk", right_on="rk", coalesce=False))
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ a ┆ lk ┆ b ┆ rk │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ x ┆ 1 ┆ x ┆ 1 │
│ y ┆ 2 ┆ y ┆ 2 │
│ z ┆ 3 ┆ z ┆ 3 │
└─────┴─────┴─────┴─────┘
Incidentally, SQL has (poorly named) COALESCE
which does something completely different.