I'm using Python, and I need to "aggregate" on the columns "R" then "J", so that for each R, each row is a unique "J".
I don't want to lose the data in C, so I need to create new columns named C1 for T=1, C2 for T=2, and C2 for T=3, that writes in the corresponding data from C to C1, C2, or C3 using T.
So I need to go from:
#______________ _______________________________
#| R J T C | |# R J C(T=1) C(T=2) C(T=3)|
#| a 1 1 x | |# a 1 x y z |
#| a 1 2 y | |# b 1 w |
#| a 1 3 z | -----> |# b 2 v |
#| b 1 1 w | |# b 3 s |
#| b 2 1 v | |# c 1 t r |
#| b 3 1 s | |# c 2 u |
#| c 1 1 t | |______________________________|
#| c 1 2 r |
#| c 2 1 u |
#|____________|
data = {'R': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'],
'J': [1, 1, 1, 1, 2, 3, 1, 1, 2],
'T': [1, 2, 3, 1, 1, 1, 1, 2, 1],
'C': ['x', 'y', 'z', 'w', 'v', 's', 't', 'r', 'u'] }
df = pd.DataFrame(data=data)
PS. If it helps, columns J and T both have an extra column with unique IDs.
J_ID = [1,1,1,2,3,4,5,5,6]
T_ID = [1,2,3,4,5,6,7,8,9]
Any help would be greatly appreciated.
You can use groupby, and then convert the C column as a list then Series.
(
df.groupby(['R','J'])
.apply(lambda x: x.C.tolist()).apply(pd.Series)
.rename(columns=lambda x: f'C{x+1}')
.reset_index()
)
R J C1 C2 C3
0 a 1 x y z
1 b 1 w NaN NaN
2 b 2 v NaN NaN
3 b 3 s NaN NaN
4 c 1 t r NaN
5 c 2 u NaN NaN