I previously have a large dataframe in pandas and I am having a hard time migrating to Polars.
I used to use the code below to calculate correlation between columns
print(df.corr(numeric_only=True).stack().sort_values(ascending=False).loc[lambda x: x < 1])
and result is like:
how am I supposed to achieve same result with Polars?
many thanks.
You can do it using corr()
and unpivot()
.
(df.corr()
.with_columns(index = pl.lit(pl.Series(df.columns)))
.unpivot(index = "index")
.filter(pl.col("index") != pl.col("variable"))
)
# Output
┌───────┬──────────┬───────────┐
│ index ┆ variable ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 │
╞═══════╪══════════╪═══════════╡
│ B ┆ A ┆ 0.493197 │
│ C ┆ A ┆ -0.866325 │
│ D ┆ A ┆ -0.493197 │
│ A ┆ B ┆ 0.493197 │
│ … ┆ … ┆ … │
│ D ┆ C ┆ 0.416025 │
│ A ┆ D ┆ -0.493197 │
│ B ┆ D ┆ -1.0 │
│ C ┆ D ┆ 0.416025 │
└───────┴──────────┴───────────┘