pythonpandas

Find the index of the current df value in another series and add to a column


I have a dataframe and a series, as follows:

import pandas as pd
from itertools import permutations

df = pd.DataFrame({'a': [['a', 'b', 'c'], ['a', 'c', 'b'], ['c', 'a', 'b']]})

prob = list(permutations(['a', 'b', 'c']))
prob = [list(ele) for ele in prob]
ps = pd.Series(prob)

>>> df
           a
0  [a, b, c]
1  [a, c, b]
2  [c, a, b]
>>> ps
0    [a, b, c]
1    [a, c, b]
2    [b, a, c]
3    [b, c, a]
4    [c, a, b]
5    [c, b, a]
dtype: object

My question is how to add a column 'idx' in df, which contains the index of the value in column 'a' in series 'ps'? The desire result is:

a     idx
[a,b,c] 0
[a,c,b] 1
[c,a,b] 4

The chatgpt gave me a answer, but it works very very slowly when my real data is huge.

df['idx'] = df['a'].apply(lambda x: ps[ps.apply(lambda y: y == x)].index[0])

Is there a more efficient way?


Solution

  • Use DataFrame.merge with DataFrame constructor:

    #if possible duplicates in ps remove them
    ps = ps.drop_duplicates()
    
    df = df.merge(pd.DataFrame({'idx': ps.index, 'a':ps.values}), on='a')
    print (df)
               a  idx
    0  [a, b, c]    0
    1  [a, c, b]    1
    2  [c, a, b]    4
    

    Solution for oldier pandas versions - converting lists to tuples before merge:

    df1 = ps.apply(tuple).reset_index().drop_duplicates(0)
    print (df1)
       index          0
    0      0  (a, b, c)
    1      1  (a, c, b)
    2      2  (b, a, c)
    3      3  (b, c, a)
    4      4  (c, a, b)
    5      5  (c, b, a)
    
    df = (df.merge(df1, left_on=df['a'].apply(tuple),right_on=df1[0])
           .drop(['key_0',0], axis=1))
    print (df)
               a  index
    0  [a, b, c]      0
    1  [a, c, b]      1
    2  [c, a, b]      4