I want to filter a DataFrame against multiple thresholds, based on the ID's prefix.
Ideally I'd configure these thresholds with a dictionary e.g.
minimum_thresholds = {
'alpha': 3,
'beta' : 5,
'gamma': 7,
'default': 4
}
For example:
data = {
'id': [
'alpha-164232e7-75c9-4e2e-9bb2-b6ba2449beba', 'alpha-205acbf0-64ba-40ad-a026-cc1c6fc06a6f',
'beta-76ece555-e336-42d8-9f8d-ee92dd90ef19', 'beta-6c91c1cc-1025-4714-a2b2-c30b2717e3c4',
'gamma-f650fd43-03d3-440c-8e14-da18cdeb78d4', 'gamma-a8cb84b5-e94c-46f7-b2c5-135b59dcd1e3',
'pi-8189aff9-ea1c-4e22-bcf4-584821c9dfd6'
],
'freq': [4, 2, 1, 4, 7, 9, 8]
}
id freq
0 alpha-164232e7-75c9-4e2e-9bb2-b6ba2449beba 4
1 alpha-205acbf0-64ba-40ad-a026-cc1c6fc06a6f 2
2 beta-76ece555-e336-42d8-9f8d-ee92dd90ef19 1
3 beta-6c91c1cc-1025-4714-a2b2-c30b2717e3c4 4
4 gamma-f650fd43-03d3-440c-8e14-da18cdeb78d4 7
5 gamma-a8cb84b5-e94c-46f7-b2c5-135b59dcd1e3 9
6 pi-8189aff9-ea1c-4e22-bcf4-584821c9dfd6 8
I would then get an output like:
id freq
0 alpha-164232e7-75c9-4e2e-9bb2-b6ba2449beba 4
1 gamma-f650fd43-03d3-440c-8e14-da18cdeb78d4 7
2 gamma-a8cb84b5-e94c-46f7-b2c5-135b59dcd1e3 9
3 pi-8189aff9-ea1c-4e22-bcf4-584821c9dfd6 8
I could do this bluntly by looping through each threshold, but it feels like there must be a more Pythonic way?
Another possible solution, whose steps are:
First, the id
column is split at each hyphen using the str.split
method, extracting the first part of each split with str[0]
.
Then, the resulting first parts are mapped to their corresponding threshold values using the map
function, referencing the thresholds
dictionary. If a value is not found in thresholds
, the default
threshold is used.
The freq
column is then compared to these threshold values using the ge
method, which checks if freq
is greater than or equal to the threshold.
Finally, the dataframe is filtered to include only rows where this condition is met.
df[df['freq']
.ge(df['id'].str.split('-').str[0]
.map(lambda x: thresholds.get(x, thresholds['default'])))]
Output:
id freq
0 alpha-164232e7-75c9-4e2e-9bb2-b6ba2449beba 4
4 gamma-f650fd43-03d3-440c-8e14-da18cdeb78d4 7
5 gamma-a8cb84b5-e94c-46f7-b2c5-135b59dcd1e3 9
6 pi-8189aff9-ea1c-4e22-bcf4-584821c9dfd6 8