pythonpandasseaborngrouped-bar-chart

Barplot grouped by class and time-interval


I have data of request response times in a pandas dataframe

    execution_time  request_type    response_time_ms    URL     Error
2   2023-10-12 08:52:16     Google  91.0    https://www.google.com  NaN
3   2023-10-12 08:52:16     CNN     115.0   https://edition.cnn.com     NaN
6   2023-10-12 08:52:27     Google  90.0    https://www.google.com  NaN
7   2023-10-12 08:52:27     CNN     105.0   https://edition.cnn.com     NaN
10  2023-10-12 08:52:37     Google  5111.0  https://www.google.com  NaN

It contains the time of the request, request_type is simply the website name and the response time.

What I want to achieve is a barplot that groups the median response time by website (request_type) and by a time frame, say group every 4 hrs together. This should show that response time varies by daytime.

I managed to create the plot but the coloring is "off". The issue I have is that I want the different websites to be colored differently.

What I have till now:

df_by_time = df.groupby(["request_type", pd.Grouper(key="execution_time", freq="4h")]).agg({"response_time_ms": ["median"]})
df_by_time.plot(kind='bar', figsize=(8, 6), title='Response Times', xlabel='Type', ylabel='Response time [ms]', rot=90) 

This leads to below image:

Response Times by hour

I would like to:

How can I achieve that?


Solution

  • If I understand correctly, you need to aggregate with 'median', not ['median'] to avoid the MultiIndex, then you can use seaborn.barplot:

    import seaborn as sns
    
    df_by_time = (df.groupby(["request_type", pd.Grouper(key="execution_time",
                                                         freq="4h")])
                    .agg({"response_time_ms": "median"})
                    .reset_index()
                 )
    
    sns.barplot(data=df_by_time, x='execution_time', y='response_time_ms',
                hue='request_type')
    

    Alternatively, use groupby.median to produce a Series and unstack to use pandas' plot.bar:

    df_by_time = (df.groupby(["request_type", pd.Grouper(key="execution_time", freq="4h")])
                    ['response_time_ms'].median()
                    .unstack('request_type')
                 )
    
    df_by_time.plot.bar()
    

    Output:

    enter image description here

    Aggregation every 20s to show you the behavior with multiple time groups:

    enter image description here