pythonpandas

Filter pandas DataFrame by multiple thresholds defined in a dictionary


I want to filter a DataFrame against multiple thresholds, based on the ID's prefix.

Ideally I'd configure these thresholds with a dictionary e.g.

minimum_thresholds = {
    'alpha': 3,
    'beta' : 5,
    'gamma': 7,
    'default': 4
}

For example:

data = {
    'id': [
         'alpha-164232e7-75c9-4e2e-9bb2-b6ba2449beba', 'alpha-205acbf0-64ba-40ad-a026-cc1c6fc06a6f',
         'beta-76ece555-e336-42d8-9f8d-ee92dd90ef19', 'beta-6c91c1cc-1025-4714-a2b2-c30b2717e3c4',
         'gamma-f650fd43-03d3-440c-8e14-da18cdeb78d4', 'gamma-a8cb84b5-e94c-46f7-b2c5-135b59dcd1e3',
         'pi-8189aff9-ea1c-4e22-bcf4-584821c9dfd6'
        ],
    'freq': [4, 2, 1, 4, 7, 9, 8]
}
                                           id  freq
0  alpha-164232e7-75c9-4e2e-9bb2-b6ba2449beba     4
1  alpha-205acbf0-64ba-40ad-a026-cc1c6fc06a6f     2
2   beta-76ece555-e336-42d8-9f8d-ee92dd90ef19     1
3   beta-6c91c1cc-1025-4714-a2b2-c30b2717e3c4     4
4  gamma-f650fd43-03d3-440c-8e14-da18cdeb78d4     7
5  gamma-a8cb84b5-e94c-46f7-b2c5-135b59dcd1e3     9
6     pi-8189aff9-ea1c-4e22-bcf4-584821c9dfd6     8

I would then get an output like:

                                           id  freq
0  alpha-164232e7-75c9-4e2e-9bb2-b6ba2449beba     4
1  gamma-f650fd43-03d3-440c-8e14-da18cdeb78d4     7
2  gamma-a8cb84b5-e94c-46f7-b2c5-135b59dcd1e3     9
3     pi-8189aff9-ea1c-4e22-bcf4-584821c9dfd6     8

I could do this bluntly by looping through each threshold, but it feels like there must be a more Pythonic way?


Solution

  • Another possible solution, whose steps are:

    df[df['freq']
       .ge(df['id'].str.split('-').str[0]
           .map(lambda x: thresholds.get(x, thresholds['default'])))]
    

    Output:

                                               id  freq
    0  alpha-164232e7-75c9-4e2e-9bb2-b6ba2449beba     4
    4  gamma-f650fd43-03d3-440c-8e14-da18cdeb78d4     7
    5  gamma-a8cb84b5-e94c-46f7-b2c5-135b59dcd1e3     9
    6     pi-8189aff9-ea1c-4e22-bcf4-584821c9dfd6     8