pythonpython-polarsaltair

Simplest way to convert aggregated data to visualize in polars


Suppose I have aggregated the mean and the median of some value over 3 months, like:

df = (data.group_by('month_code').agg(pl.col('value').mean().alias('avg'),
                                      pd.col('value').median().alias('med')
                                      )
          .sort('month_code')
          .collect()
      )

Resulting in something like:

df = pd.DataFrame({'month': ['M202412','M202501','M202502'],
                   'avg': [0.037824, 0.03616, 0.038919],
                   'med': [0.01381, 0.013028, 0.014843]
                   })

And I'd like to visualize it, so should convert to the format:

df_ = pd.DataFrame({'month': ['M202412','M202501','M202502']*2,
                   'type': ['avg','avg','avg','med','med','med'],
                   'value': [0.037824, 0.03616, 0.038919, 0.01381, 0.013028, 0.014843],
                   })

Which is then easy to visualize:

df_.plot.line(x='month',y='value',color='type').properties(width=400, height=350, title='avg and med')

What is the simplest way to convert df to df_ above?


Solution

  • You can try uisng df.melt to convert from wide to long formats. It keeps the month column as an identifier and unpivots the avg and med columns into rows with corresponding type and value columns.

    df_ = df.melt(
        id_vars=["month"],
        value_vars=["avg", "med"],
        variable_name="type",
        value_name="value"
    )
    

    Full reproducable:

    import polars as pl
    import plotly.express as px
    
    df = pl.DataFrame({
        'month': ['M202412', 'M202501', 'M202502'],
        'avg': [0.037824, 0.03616, 0.038919],
        'med': [0.01381, 0.013028, 0.014843]
    })
    
    df_ = df.melt(
        id_vars=["month"],
        value_vars=["avg", "med"],
        variable_name="type",
        value_name="value"
    )
    
    fig = px.line(
        df_.to_pandas(),  
        x='month',
        y='value',
        color='type',
        title='Average and Median Values'
    )
    
    fig.update_layout(
        width=400,
        height=350
    )
    
    fig.show()
    

    enter image description here