I am going through pandas groupby docs and when I groupby on particular column as below:
df:
A B C D
0 foo one -0.987674 0.039616
1 bar one -0.653247 -1.022529
2 foo two 0.404201 1.308777
3 bar three 1.620780 0.574377
4 foo two 1.661942 0.579888
5 bar two 0.747878 0.463052
6 foo one 0.070278 0.202564
7 foo three 0.779684 -0.547192
grouped=df.groupby('A')
grouped.describe(A)
gives
C ... D
count mean std ... 50% 75% max
A B ...
bar one 1.0 0.224944 NaN ... 1.107509 1.107509 1.107509
three 1.0 0.704943 NaN ... 1.833098 1.833098 1.833098
two 1.0 -0.091613 NaN ... -0.549254 -0.549254 -0.549254
foo one 2.0 0.282298 1.554401 ... -0.334058 0.046640 0.427338
three 1.0 1.688601 NaN ... -1.457338 -1.457338 -1.457338
two 2.0 1.206690 0.917140 ... -0.096405 0.039241 0.174888
what 25%,50%,75% signifies when described? a bit of explaination please?
You can test DataFrameGroupBy.describe
:
Notes:
For numeric data, the result’s index will include count, mean, std, min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.
can you explain for foo-one value for above eg?
It is called Mulitindex
:
Hierarchical / Multi-level indexing is very exciting as it opens the door to some quite sophisticated data analysis and manipulation, especially for working with higher dimensional data. In essence, it enables you to store and manipulate data with an arbitrary number of dimensions in lower dimensional data structures like Series (1d) and DataFrame (2d).
grouped=df.groupby(['A', 'B'])
df = grouped.describe()
print (df.index)
MultiIndex([('bar', 'one'),
('bar', 'three'),
('bar', 'two'),
('foo', 'one'),
('foo', 'three'),
('foo', 'two')],
names=['A', 'B'])
print (df.columns)
MultiIndex([('C', 'count'),
('C', 'mean'),
('C', 'std'),
('C', 'min'),
('C', '25%'),
('C', '50%'),
('C', '75%'),
('C', 'max'),
('D', 'count'),
('D', 'mean'),
('D', 'std'),
('D', 'min'),
('D', '25%'),
('D', '50%'),
('D', '75%'),
('D', 'max')],
)
print (df.loc[('foo','one'), ('C', '75%')])
-0.19421