I have a df with ten columns and another column with its values are partial name of the ten columns. Here is a similar sample:
import polars as pl
df = pl.DataFrame({
"ID" :["A" ,"B" ,"C" ] ,
"A Left" :["W1" ,"W2" ,"W3" ] ,
"A Right":["P1" ,"P2" ,"P3" ] ,
"B Left" :["G1" ,"G2" ,"G3" ] ,
"B Right":["Y1" ,"Y2" ,"Y3" ] ,
"C Left" :["M1" ,"M2" ,"M3" ] ,
"C Right":["K1" ,"K2" ,"K3" ] ,
})
df
shape: (3, 7)
┌─────┬────────┬─────────┬────────┬─────────┬────────┬─────────┐
│ ID ┆ A Left ┆ A Right ┆ B Left ┆ B Right ┆ C Left ┆ C Right │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │
╞═════╪════════╪═════════╪════════╪═════════╪════════╪═════════╡
│ A ┆ W1 ┆ P1 ┆ G1 ┆ Y1 ┆ M1 ┆ K1 │
│ B ┆ W2 ┆ P2 ┆ G2 ┆ Y2 ┆ M2 ┆ K2 │
│ C ┆ W3 ┆ P3 ┆ G3 ┆ Y3 ┆ M3 ┆ K3 │
└─────┴────────┴─────────┴────────┴─────────┴────────┴─────────┘
I want to add a column with its value selected from the other columns based on ID
column like below:
shape: (3, 8)
┌─────┬────────┬─────────┬────────┬─────────┬────────┬─────────┬───────┐
│ ID ┆ A Left ┆ A Right ┆ B Left ┆ B Right ┆ C Left ┆ C Right ┆ value │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │
╞═════╪════════╪═════════╪════════╪═════════╪════════╪═════════╪═══════╡
│ A ┆ W1 ┆ P1 ┆ G1 ┆ Y1 ┆ M1 ┆ K1 ┆ W1-P1 │
│ B ┆ W2 ┆ P2 ┆ G2 ┆ Y2 ┆ M2 ┆ K2 ┆ G2-Y2 │
│ C ┆ W3 ┆ P3 ┆ G3 ┆ Y3 ┆ M3 ┆ K3 ┆ M3-K3 │
└─────┴────────┴─────────┴────────┴─────────┴────────┴─────────┴───────┘
I got this result using unpivot:
df.join( df.unpivot(index='ID').with_columns(
pl.when(pl.col("ID") == pl.col("variable").str.slice(0,1)).then(pl.col("value"))
).select("ID" , "value").drop_nulls().group_by("ID").agg(pl.col('value').str.join("-"))
,on='ID').sort("ID")
However, I need to avoid unpivot because I have two groups of ten columns beside other 50 columns.
I have tried using pl.col()
and polars.selectors
, but I couldn't get the result.
import polars.selectors as cs
df.with_columns(
cs.by_name(
( pl.concat_str([pl.col('ID') , " Left"] ) )
).alias("value")
)
TypeError: invalid name: <Expr ['col("ID").str.concat_horizonta…']
Any suggested solution?
It looks like you want to extract the "base" / "prefix" of the Left/Right columns?
There are various ways you could do that:
columns = pl.Series(df.select("^.+ (Left|Right)$").columns)
columns = columns.str.extract(r"(.+) (Left|Right)$")
shape: (3,)
Series: '' [str]
[
"A"
"B"
"C"
]
You could then use pl.coalesce()
to create a single column of the chosen when/then
values:
df.with_columns(
pl.coalesce(
pl.when(pl.col("ID") == col).then(
pl.format("{}-{}", pl.col(f"{col} Left"), pl.col(f"{col} Right"))
) for col in columns
)
.alias("value")
)
shape: (3, 8)
┌─────┬────────┬─────────┬────────┬─────────┬────────┬─────────┬───────┐
│ ID ┆ A Left ┆ A Right ┆ B Left ┆ B Right ┆ C Left ┆ C Right ┆ value │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │
╞═════╪════════╪═════════╪════════╪═════════╪════════╪═════════╪═══════╡
│ A ┆ W1 ┆ P1 ┆ G1 ┆ Y1 ┆ M1 ┆ K1 ┆ W1-P1 │
│ B ┆ W2 ┆ P2 ┆ G2 ┆ Y2 ┆ M2 ┆ K2 ┆ G2-Y2 │
│ C ┆ W3 ┆ P3 ┆ G3 ┆ Y3 ┆ M3 ┆ K3 ┆ M3-K3 │
└─────┴────────┴─────────┴────────┴─────────┴────────┴─────────┴───────┘