Suppose I have aggregated the mean and the median of some value over 3 months, like:
df = (data.group_by('month_code').agg(pl.col('value').mean().alias('avg'),
pd.col('value').median().alias('med')
)
.sort('month_code')
.collect()
)
Resulting in something like:
df = pd.DataFrame({'month': ['M202412','M202501','M202502'],
'avg': [0.037824, 0.03616, 0.038919],
'med': [0.01381, 0.013028, 0.014843]
})
And I'd like to visualize it, so should convert to the format:
df_ = pd.DataFrame({'month': ['M202412','M202501','M202502']*2,
'type': ['avg','avg','avg','med','med','med'],
'value': [0.037824, 0.03616, 0.038919, 0.01381, 0.013028, 0.014843],
})
Which is then easy to visualize:
df_.plot.line(x='month',y='value',color='type').properties(width=400, height=350, title='avg and med')
What is the simplest way to convert df to df_ above?
You can try uisng df.melt to convert from wide to long formats. It keeps the month
column as an identifier and unpivots
the avg
and med
columns into rows with corresponding type and value columns.
df_ = df.melt(
id_vars=["month"],
value_vars=["avg", "med"],
variable_name="type",
value_name="value"
)
Full reproducable:
import polars as pl
import plotly.express as px
df = pl.DataFrame({
'month': ['M202412', 'M202501', 'M202502'],
'avg': [0.037824, 0.03616, 0.038919],
'med': [0.01381, 0.013028, 0.014843]
})
df_ = df.melt(
id_vars=["month"],
value_vars=["avg", "med"],
variable_name="type",
value_name="value"
)
fig = px.line(
df_.to_pandas(),
x='month',
y='value',
color='type',
title='Average and Median Values'
)
fig.update_layout(
width=400,
height=350
)
fig.show()