I am trying to use the .describe() method on df1
to obtain aggregates. The current index
is year
. I want to obtain these stats based on each statistics
over the 3 year period in the index. I tried using stats_df = df1.groupby('statistics').descirbe().unstack(1))
but I don't get the result that I am looking for.
in df1 =
statistics s_values
year
1999 cigarette use 100
1999 cellphone use 310
1999 internet use 101
1999 alcohol use 100
1999 soda use 215
2000 cigarette use 315
2000 cellphone use 317
2000 internet use 325
2000 alcohol use 108
2000 soda use 200
2001 cigarette use 122
2001 cellphone use 311
2001 internet use 112
2001 alcohol use 144
2001 soda use 689
2002 cigarette use 813
2002 cellphone use 954
2002 internet use 548
2002 alcohol use 882
2002 soda use 121
I am trying to achieve an output like this. Please keep in mind these aggregate values are not accurate I just populated them with random numbers to give you the idea of the format.
result stats_df =
statistics count unique top freq mean std min 20% 40% 50% 60% 80% max
cigarette use 32 335 655 54 45 45 1 23 21 12 55 55 999
cellphone use 92 131 895 49 12 33 6 13 32 55 34 12 933
internet use 32 111 123 44 65 31 2 42 544 15 11 54 111
alcohol use 32 315 611 33 41 53 3 34 22 34 11 33 555
soda use 32 355 655 54 45 45 1 23 21 12 55 55 999
thank you
I created a sample dataframe and I could get the result with just using groupby().describe(). I am unsure what's wrong with your code, could you also edit your post to show the result you obtained?
here's mine
df = pd.DataFrame(index=[1999,1999,1999,1999,1999,2000,2000,2000,2000,2000], columns=['statistics', 's_values'], data=[['cigarette use', 100],['cellphone use', 310],['internet use',
101],['alcohol use', 100], ['soda use', 215],['cigarette use', 315],['cellphone use', 317],['internet use', 325],['alcohol use', 108],['soda use', 200]])
df.groupby("statistics").describe()
output:
s_values
count mean std min 25% 50% 75% max
statistics
alcohol use 2.0 104.0 5.656854 100.0 102.00 104.0 106.00 108.0
cellphone use 2.0 313.5 4.949747 310.0 311.75 313.5 315.25 317.0
cigarette use 2.0 207.5 152.027958 100.0 153.75 207.5 261.25 315.0
internet use 2.0 213.0 158.391919 101.0 157.00 213.0 269.00 325.0
soda use 2.0 207.5 10.606602 200.0 203.75 207.5 211.25 215.0