pandaslistpandas-explode

pandas pre-filter an exploded list


I have a list column in a pandas dataframe that needs to be exploded and filtered based on another list.

df_have = pd.DataFrame({'user': ['emp_1', 'emp_2', 'emp_3', 'admin'],
                        'group_ids': [[5, 3], [4, 2, 3], [1, 4], [1, 2, 3, 4, 5]]})

> df_have
    user        group_ids
0  emp_1           [5, 3]
1  emp_2        [4, 2, 3]
2  emp_3           [1, 4]
3  admin  [1, 2, 3, 4, 5]

I have tried the below:

df_have = df_have.explode('group_ids')
selected_ids = [2, 3]
df_want = df_have[df_have['group_ids'].isin(selected_ids)]

> df_want
    user group_ids
0  emp_1         3
1  emp_2         2
1  emp_2         3
3  admin         2
3  admin         3

This gives the correct output, but is very slow when group_ids column is more than 100, and the selected_ids are usually less than 5 groups at a time.

My question is: is there a way to pre-filter and explode only the values within the groups that have the selected_ids?


Solution

  • You can use a list comprehension to pre-filter, with a set of the values as reference:

    selected_ids = [2, 3]
    S = set(selected_ids)
    
    out = (df.assign(group_ids=[[x for x in l if x in S] for l in df['group_ids']])
              .explode('group_ids').dropna(subset=['group_ids'])
          )
    

    Output:

        user group_ids
    0  emp_1         3
    1  emp_2         2
    1  emp_2         3
    3  admin         2
    3  admin         3