I have data of request response times in a pandas dataframe
execution_time request_type response_time_ms URL Error
2 2023-10-12 08:52:16 Google 91.0 https://www.google.com NaN
3 2023-10-12 08:52:16 CNN 115.0 https://edition.cnn.com NaN
6 2023-10-12 08:52:27 Google 90.0 https://www.google.com NaN
7 2023-10-12 08:52:27 CNN 105.0 https://edition.cnn.com NaN
10 2023-10-12 08:52:37 Google 5111.0 https://www.google.com NaN
It contains the time of the request, request_type is simply the website name and the response time.
What I want to achieve is a barplot that groups the median response time by website (request_type) and by a time frame, say group every 4 hrs together. This should show that response time varies by daytime.
I managed to create the plot but the coloring is "off". The issue I have is that I want the different websites to be colored differently.
What I have till now:
df_by_time = df.groupby(["request_type", pd.Grouper(key="execution_time", freq="4h")]).agg({"response_time_ms": ["median"]})
df_by_time.plot(kind='bar', figsize=(8, 6), title='Response Times', xlabel='Type', ylabel='Response time [ms]', rot=90)
This leads to below image:
I would like to:
How can I achieve that?
If I understand correctly, you need to aggregate with 'median'
, not ['median']
to avoid the MultiIndex, then you can use seaborn.barplot
:
import seaborn as sns
df_by_time = (df.groupby(["request_type", pd.Grouper(key="execution_time",
freq="4h")])
.agg({"response_time_ms": "median"})
.reset_index()
)
sns.barplot(data=df_by_time, x='execution_time', y='response_time_ms',
hue='request_type')
Alternatively, use groupby.median
to produce a Series and unstack
to use pandas' plot.bar
:
df_by_time = (df.groupby(["request_type", pd.Grouper(key="execution_time", freq="4h")])
['response_time_ms'].median()
.unstack('request_type')
)
df_by_time.plot.bar()
Output:
Aggregation every 20s to show you the behavior with multiple time groups: