I have the df:
import pandas as pd
data = {"id_1":["a","b","c"],
"id_2":["q","w","e"],
"val_1":[1,2,3],
"val_2":[2,0,0]}
df = pd.DataFrame(data)
# id_1 id_2 val_1 val_2
# 0 a q 1 2
# 1 b w 2 0
# 2 c e 3 0
I want to compare the values of val_1
and val_2
columns.
If val_1
is more than val_2
(they are never the same) I want to populate a new column with the value of id_1
, else the value of id_2
. To create:
id_1 id_2 val_1 val_2 max_id
0 a q 1 2 q
1 b w 2 0 b
2 c e 3 0 c
max_id
of the first row is q because val_2>val_1
...
I can get the max value with:
df[["val_1", "val_2"]].max(axis=1)
And the column name from which the fetch the ids with:
index_map = {0:"id_1", 1:"id_2"}
df.apply(lambda x: [x.val_1, x.val_2].index(max([x.val_1, x.val_2])), axis=1).map(index_map)
# 0 id_2
# 1 id_1
# 2 id_1
But then my ideas run out.
Since you have two columns, a simple approach would be to use where
:
df['max_id'] = df['id_1'].where(df['val_1'].gt(df['val_2']), df['id_2'])
If you want to generalize to an arbitrary number of id/values pairs, you could combine idxmax
and indexing lookup:
idx, cols = pd.factorize(df.filter(like='val_').idxmax(axis=1)
.str.replace('val_', 'id_'))
df['max_id'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
Or with wide_to_long
:
tmp = pd.wide_to_long(df.reset_index(), stubnames=['id', 'val'],
i='index', j='col', sep='_')
df['max_id'] = (tmp.loc[tmp.groupby(level='index')['val'].idxmax(), 'id']
.droplevel('col')
)
Or, if the columns are already sorted in the correct order (1,2,3... ; 1,2,3...), using numpy:
ids = df.filter(like='id_').to_numpy()
vals = df.filter(like='val_').to_numpy()
df['max_id'] = ids[np.arange(ids.shape[0]), np.argmax(vals, axis=1)]
Output:
id_1 id_2 val_1 val_2 max_id
0 a q 1 2 q
1 b w 2 0 b
2 c e 3 0 c
Intermediates:
# idxmax + factorize approach:
id_1 id_2 val_1 val_2 idxmax str.replace
0 a q 1 2 val_2 id_2
1 b w 2 0 val_1 id_1
2 c e 3 0 val_1 id_1
# wide_to_long output:
id val
index col
0 1 a 1
1 1 b 2 # max from index = 1
2 1 c 3 # max from index = 2
0 2 q 2 # max from index = 0
1 2 w 0
2 2 e 0