I have a dataframe and a series, as follows:
import pandas as pd
from itertools import permutations
df = pd.DataFrame({'a': [['a', 'b', 'c'], ['a', 'c', 'b'], ['c', 'a', 'b']]})
prob = list(permutations(['a', 'b', 'c']))
prob = [list(ele) for ele in prob]
ps = pd.Series(prob)
>>> df
a
0 [a, b, c]
1 [a, c, b]
2 [c, a, b]
>>> ps
0 [a, b, c]
1 [a, c, b]
2 [b, a, c]
3 [b, c, a]
4 [c, a, b]
5 [c, b, a]
dtype: object
My question is how to add a column 'idx' in df, which contains the index of the value in column 'a' in series 'ps'? The desire result is:
a idx
[a,b,c] 0
[a,c,b] 1
[c,a,b] 4
The chatgpt gave me a answer, but it works very very slowly when my real data is huge.
df['idx'] = df['a'].apply(lambda x: ps[ps.apply(lambda y: y == x)].index[0])
Is there a more efficient way?
Use DataFrame.merge
with DataFrame
constructor:
#if possible duplicates in ps remove them
ps = ps.drop_duplicates()
df = df.merge(pd.DataFrame({'idx': ps.index, 'a':ps.values}), on='a')
print (df)
a idx
0 [a, b, c] 0
1 [a, c, b] 1
2 [c, a, b] 4
Solution for oldier pandas versions - converting lists to tuples before merge
:
df1 = ps.apply(tuple).reset_index().drop_duplicates(0)
print (df1)
index 0
0 0 (a, b, c)
1 1 (a, c, b)
2 2 (b, a, c)
3 3 (b, c, a)
4 4 (c, a, b)
5 5 (c, b, a)
df = (df.merge(df1, left_on=df['a'].apply(tuple),right_on=df1[0])
.drop(['key_0',0], axis=1))
print (df)
a index
0 [a, b, c] 0
1 [a, c, b] 1
2 [c, a, b] 4