I'm trying to plot arrays as boxplot from a dataframe as the second picture here.
An extract of my data (I have data over 6 years, 150 per year) :
columns : idx | id | mods | Mean(Moyennes) | Median | Values_array | date2021
idx1 | 2021012 | Day | 273.7765808105 | 273.5100097656 |
272.3800048828,272.3800048828,272.3999938965,272.3999938965,276.5199890137,274.3800048828,274.3800048828 |2021-12-01T00:00:00.000Z
idx2 | 2021055 | Night| 287.5215759277 | 287.6099853516 | 286.0400085449,286.0400085449,286.0400085449,286.0400085449,284.8599853516,285.0400085449,285.0400085449,286.7200012207,286.799987793,286.799987793,287,288.2399902344,288.2399902344 |2021-02-24T00:00:00.000Z
Here is my data plotted with sns.relplot
To plot it, I tried :
sns.boxplot(data=df2018, x="Moyennes", y="date2018", hue = "mods")
It turns out, it looks like this
I don't understand why the date turns out like this and not like with sns.relplot. Also, I want to boxplot my array as a all because in my understanding you have to put an array for it to compute mean, median etc ..
I also tried :
for i, j in sorted(df2017.iterrows()):
values = j[4]
date = j[6]
id=j[0]
fig, ax1 = plt.subplots(figsize=(10, 6))
fig.canvas.manager.set_window_title('Température 2020')
fig.subplots_adjust(left=0.075, right=0.95, top=0.9, bottom=0.25)
bp = ax1.boxplot(values, notch=False, sym='+', vert=True, whis=1.5)
plt.setp(bp['boxes'], color='black')
plt.setp(bp['whiskers'], color='black')
plt.setp(bp['fliers'], color='red', marker='+')
the output is like this, which is nice but I want every boxplot of on year to be in the same plot.
I'm working on vscode, vm linux.
My question is, how can I boxplot several arrays with seaborn?
'Values_array'
is a string of comma separate numbers, which must be converted to separate rows, and then set to float
type.sns.catplot
with kind='box'
, or the axes-level method sns.boxplot
.
col
, col_wrap
, and row
parameters for subplots (facets) with sns.catplot
.python 3.11.2
, pandas 2.0.0
, matplotlib 3.7.1
, seaborn 0.12.2
import pandas as pd
import seaborn as sns
# sample data
data = {'idx': ['idx1 ', 'idx2 '],
'id': [2021012, 2021055],
'mods': ['Day', 'Night'],
'Mean(Moyennes)': [273.7765808105, 287.5215759277],
'Median': [273.5100097656, 287.6099853516],
'Values_array': ['272.3800048828,272.3800048828,272.3999938965,272.3999938965,276.5199890137,274.3800048828,274.3800048828', '286.0400085449,286.0400085449,286.0400085449,286.0400085449,284.8599853516,285.0400085449,285.0400085449,286.7200012207,286.799987793,286.799987793,287,288.2399902344,288.2399902344'],
'date2021': ['2021-12-01T00:00:00.000Z', '2021-02-24T00:00:00.000Z']}
df = pd.DataFrame(data)
# convert the column to a datetime.date type since there's no time component
df.date2021 = pd.to_datetime(df.date2021).dt.date
# split the strings in the Values_array column
df.Values_array = df.Values_array.str.split(',')
# explode the list of strings to individual rows
df = df.explode(column='Values_array', ignore_index=True)
# set the type of the Values_array column to float
df.Values_array = df.Values_array.astype(float)
# plot the data in a single facet
g = sns.catplot(data=df, x='date2021', y='Values_array', kind='box')
# same plot with sns.boxplot instead of sns.catplot
g = sns.boxplot(data=df, x='date2021', y='Values_array')
df
before cleaning idx id mods Mean(Moyennes) Median Values_array date2021
0 idx1 2021012 Day 273.776581 273.510010 272.3800048828,272.3800048828,272.3999938965,272.3999938965,276.5199890137,274.3800048828,274.3800048828 2021-12-01T00:00:00.000Z
1 idx2 2021055 Night 287.521576 287.609985 286.0400085449,286.0400085449,286.0400085449,286.0400085449,284.8599853516,285.0400085449,285.0400085449,286.7200012207,286.799987793,286.799987793,287,288.2399902344,288.2399902344 2021-02-24T00:00:00.000Z
df
after cleaning idx id mods Mean(Moyennes) Median Values_array date2021
0 idx1 2021012 Day 273.776581 273.510010 272.380005 2021-12-01
1 idx1 2021012 Day 273.776581 273.510010 272.380005 2021-12-01
2 idx1 2021012 Day 273.776581 273.510010 272.399994 2021-12-01
3 idx1 2021012 Day 273.776581 273.510010 272.399994 2021-12-01
4 idx1 2021012 Day 273.776581 273.510010 276.519989 2021-12-01
5 idx1 2021012 Day 273.776581 273.510010 274.380005 2021-12-01
6 idx1 2021012 Day 273.776581 273.510010 274.380005 2021-12-01
7 idx2 2021055 Night 287.521576 287.609985 286.040009 2021-02-24
8 idx2 2021055 Night 287.521576 287.609985 286.040009 2021-02-24
9 idx2 2021055 Night 287.521576 287.609985 286.040009 2021-02-24
10 idx2 2021055 Night 287.521576 287.609985 286.040009 2021-02-24
11 idx2 2021055 Night 287.521576 287.609985 284.859985 2021-02-24
12 idx2 2021055 Night 287.521576 287.609985 285.040009 2021-02-24
13 idx2 2021055 Night 287.521576 287.609985 285.040009 2021-02-24
14 idx2 2021055 Night 287.521576 287.609985 286.720001 2021-02-24
15 idx2 2021055 Night 287.521576 287.609985 286.799988 2021-02-24
16 idx2 2021055 Night 287.521576 287.609985 286.799988 2021-02-24
17 idx2 2021055 Night 287.521576 287.609985 287.000000 2021-02-24
18 idx2 2021055 Night 287.521576 287.609985 288.239990 2021-02-24
19 idx2 2021055 Night 287.521576 287.609985 288.239990 2021-02-24