I have the following table. I need to group by Col1 , check if Col2 contains Y,
if yes then create new column Col4 and insert corresponding Col3 in all rows of Col4 if no then just coly Col3 in Col4
index | Col1 | Col2 | Col3 |
---|---|---|---|
0 | 1 | X | ABC |
1 | 1 | Y | XX |
2 | 1 | X | QW |
3 | 2 | X | VB |
4 | 2 | X | AY |
5 | 3 | X | MM |
6 | 3 | X | YY |
7 | 3 | Y | XX |
Desired table
------ | Col1 | Col2 | Col3 | New_Col |
---|---|---|---|---|
0 | 1 | X | ABC | XX |
1 | 1 | Y | XX | XX |
2 | 1 | X | QW | XX |
3 | 2 | X | VB | VB |
4 | 2 | X | AY | AY |
5 | 3 | X | MM | XX |
6 | 3 | X | YY | XX |
7 | 3 | Y | XX | XX |
Filter only Y
rows by boolean indexing
and use Series.map
by Col1
, last replace non matched values by Series.fillna
:
s = df[df['Col2'].eq('Y')].set_index('Col1')['Col3']
df['Col4'] = df['Col1'].map(s).fillna(df['Col3'])
print (df)
index Col1 Col2 Col3 Col4
0 0 1 X ABC XX
1 1 1 Y XX XX
2 2 1 X QW XX
3 3 2 X VB VB
4 4 2 X AY AY
5 5 3 X MM XX
6 6 3 X YY XX
7 7 3 Y XX XX
If possible matched value is NaN
s solution is modified:
print (df)
index Col1 Col2 Col3
0 0 1 X ABC
1 1 1 Y NaN
2 2 1 X QW
3 3 2 X VB
4 4 2 X AY
5 5 3 X MM
6 6 3 X YY
7 7 3 Y XX
s = df[df['Col2'].eq('Y')].set_index('Col1')['Col3']
df['Col4'] = np.where(df['Col1'].isin(s.index), df['Col1'].map(s), df['Col3'])
print (df)
index Col1 Col2 Col3 Col4
0 0 1 X ABC NaN
1 1 1 Y NaN NaN
2 2 1 X QW NaN
3 3 2 X VB VB
4 4 2 X AY AY
5 5 3 X MM XX
6 6 3 X YY XX
7 7 3 Y XX XX
EDIT: Solution if multiple Y
values per groups - replace random value per groups by DataFrameGroupBy.sample
:
m = df['Col2'].eq('Y')
s = df[m].groupby(df['Col1']).sample(1).set_index('Col1')['Col3']
df['Col4'] = df['Col3'].where(m).fillna(df['Col1'].map(s)).fillna(df['Col3'])
print (df)
index Col1 Col2 Col3 Col4
0 0 1 Y ABC ABC
1 1 1 Y XX XX
2 2 1 X QW ABC
3 3 2 X VB VB
4 4 2 X AY AY
5 5 3 X MM XX
6 6 3 X YY XX
7 7 3 Y XX XX