I need to create a heatmap on the basis of a tidy/long pl.DataFrame
. Consider the following example, where I used pandas
and plotly
to create a heatmap.
import plotly.express as px
import polars as pl
tidy_df_pl = pl.DataFrame(
{
"x": [10, 10, 10, 20, 20, 20, 30, 30, 30],
"y": [3, 4, 5, 3, 4, 5, 3, 4, 5],
"value": [5, 8, 2, 4, 10, 14, 10, 8, 9],
}
)
print(tidy_df_pl)
shape: (9, 3)
┌─────┬─────┬───────┐
│ x ┆ y ┆ value │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═══════╡
│ 10 ┆ 3 ┆ 5 │
│ 10 ┆ 4 ┆ 8 │
│ 10 ┆ 5 ┆ 2 │
│ 20 ┆ 3 ┆ 4 │
│ 20 ┆ 4 ┆ 10 │
│ 20 ┆ 5 ┆ 14 │
│ 30 ┆ 3 ┆ 10 │
│ 30 ┆ 4 ┆ 8 │
│ 30 ┆ 5 ┆ 9 │
└─────┴─────┴───────┘
Transforming to a wide pd.DataFrame
:
pivot_df_pd = (
tidy_df_pl.pivot(index="x", on="y", values="value").to_pandas().set_index("x")
)
print(pivot_df_pd)
3 4 5
x
10 5 8 2
20 4 10 14
30 10 8 9
Creating the heatmap using plotly
.
fig = px.imshow(pivot_df_pd)
fig.show()
This all seems a bit cumbersome. I am looking for polars
-only.
How can I create this heatmap directly from polars
without going through a third library?
Here is the heatmap without the additional column (like the above answer has). It is the same as your pandas output.
fig = px.imshow(pivot_df_pl.drop("x"), y=pivot_df_pl["x"])
fig.show()
It does seem that Plotly handles pandas indexes specifically as the y-axis. So, there is a tiny bit more to do here, but it is pure Polars.
If plotly really were to handle polars data natively, I would expect it can handle tidy dataframes, i.e. no need for pivot.
It does look like this is possible. It also looks to work the same for pandas when wanting to create a heatmap from a tidy df.
import plotly.graph_objects as go
fig2 = go.Figure(
go.Heatmap(
x=tidy_df_pl["y"],
y=tidy_df_pl["x"],
z=tidy_df_pl["value"],
)
)
# switch the y-axis to align with previous output
fig2.update_layout(yaxis_autorange="reversed")
fig2.show()
Another library that also can handle the tidy format (which also happens to be the .plot
namespace for Polars dataframes) is Altair. Here is a very similar output using Altair
import altair as alt
(
tidy_df_pl.plot.rect(
x="y:O",
y="x:O",
# use the plotly theme
# if not wanted, just write `color="value:Q"` instead
color=alt.Color("value:Q", scale=alt.Scale(scheme="plasma")),
)
.properties(width=500, height=400)
)