[SOLVED] Collapse values from multiple rows of a column into an array when all other columns values are same

Collapse values from multiple rows of a column into an array when all other columns values are same

I have a table with 7 columns where for every few rows, 6 columns remain same and only the 7th changes. I would like to merge all these rows into one row, and combine the value of the 7th column into a list.

So if I have this dataframe:

I would like to convert it to this:

   A       B  C
0  a       1  2
1  b       3  4
2  c  [5, 7]  6

Since the values of column A and C were same in row 2 and 3, they would get collapsed into a single row and the values of B will be combined into a list.

Melt, explode, and pivot don't seem to have such functionality. How can achieve this using Pandas?

Solution

Use GroupBy.agg with custom lambda function, last add DataFrame.reindex for same order of columns by original:

f = lambda x: x.tolist() if len(x) > 1 else x
df = df.groupby(['A','C'])['B'].agg(f).reset_index().reindex(df.columns, axis=1)

You can also create columns names dynamic like:

changes = ['B']
cols = df.columns.difference(changes).tolist()

f = lambda x: x.tolist() if len(x) > 1 else x
df = df.groupby(cols)[changes].agg(f).reset_index().reindex(df.columns, axis=1)
print (df)
   A       B  C
0  a       1  2
1  b       3  4
2  c  [5, 7]  6

For all lists in column solution is simplier:

changes = ['B']
cols = df.columns.difference(changes).tolist()

df = df.groupby(cols)[changes].agg(list).reset_index().reindex(df.columns, axis=1)
print (df)
   A       B  C
0  a     [1]  2
1  b     [3]  4
2  c  [5, 7]  6