I have a list column in a pandas dataframe that needs to be exploded and filtered based on another list.
df_have = pd.DataFrame({'user': ['emp_1', 'emp_2', 'emp_3', 'admin'],
'group_ids': [[5, 3], [4, 2, 3], [1, 4], [1, 2, 3, 4, 5]]})
> df_have
user group_ids
0 emp_1 [5, 3]
1 emp_2 [4, 2, 3]
2 emp_3 [1, 4]
3 admin [1, 2, 3, 4, 5]
I have tried the below:
df_have = df_have.explode('group_ids')
selected_ids = [2, 3]
df_want = df_have[df_have['group_ids'].isin(selected_ids)]
> df_want
user group_ids
0 emp_1 3
1 emp_2 2
1 emp_2 3
3 admin 2
3 admin 3
This gives the correct output, but is very slow when group_ids
column is more than 100, and the selected_ids
are usually less than 5 groups at a time.
My question is: is there a way to pre-filter and explode only the values within the groups that have the selected_ids
?
You can use a list comprehension to pre-filter, with a set
of the values as reference:
selected_ids = [2, 3]
S = set(selected_ids)
out = (df.assign(group_ids=[[x for x in l if x in S] for l in df['group_ids']])
.explode('group_ids').dropna(subset=['group_ids'])
)
Output:
user group_ids
0 emp_1 3
1 emp_2 2
1 emp_2 3
3 admin 2
3 admin 3