I'm sure this has been asked before, sorry if duplicate. Suppose I have the following dataframe:
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
'data': range(6)}, columns=['key', 'data'])
>>
key data
0 A 0
1 B 1
2 C 2
3 A 3
4 B 4
5 C 5
Doing a groupby on 'key', df.groupby('key').sum()
I know we can do things like:
>>
data
key
A 3
B 5
C 7
What is the easiest way to get all the 'splitted' data in an array?:
>>
data
key
A [0, 3]
B [1, 4]
C [2, 5]
I'm not necessarily grouping by just one key, but with several other indexes as well ('year' and 'month' for example) which is why I'd like to use the groupby function, but preserve all the grouped values in an array.
You can use apply(list)
:
print(df.groupby('key').data.apply(list).reset_index())
key data
0 A [0, 3]
1 B [1, 4]
2 C [2, 5]