I would like to filter a datframe that has association rules results. I want antecedents that contain an element like H or L in my case. The antecedents are frozenset types. I tried Hrules but it is not working.
Hrules=fdem_rules['H' in fdem_rules['antecedents']]
Hrules=fdem_rules[frozenset({'H'}) in fdem_rules['antecedents']]
did not work
In the following example, I need only rows 46 and 89 as they have H.
df = pd.DataFrame({'antecedents': [frozenset({'N', 'M', '60'}), frozenset({'H', 'AorE'}), frozenset({'0-35', 'H', 'AorE', '60'}), frozenset({'AorE', 'M', '60', '0'}), frozenset({'0-35', 'F'})]})
antecedents
75 (N, M, 60)
46 (H, AorE)
89 (0-35, H, AorE, 60)
103 (AorE, M, 60, 0)
38 (0-35, F)
You can use apply
with set/frozenset's method. Here to check is at least H or L is present, one can use the negation of {'H', 'L'}.isdisjoint
:
match = {'H', 'L'}
df['H or L'] = ~df['antecedents'].apply(match.isdisjoint)
A much faster variant of the above is to use a list comprehension:
match = {'H', 'L'}
df['H or L'] = [not match.isdisjoint(x) for x in df['antecedents']]
Another option is to explode
the frozenset, use isin
, and aggregate the result with groupby
+any
:
match = {'H', 'L'}
df['H or L'] = df['antecedents'].explode().isin(match).groupby(level=0).any()
output:
>>> df[['antecedents', 'H or L']]
antecedents H or L
75 (N, M, 60) False
46 (H, AorE) True
89 (0-35, H, AorE, 60) True
103 (AorE, M, 60, 0) False
38 (0-35, F) False
match = {'H', 'L'}
idx = [not match.isdisjoint(x) for x in df['antecedents']]
df[idx]
output:
antecedents consequents other_cols
46 (H, AorE) (N) ...
89 (0-35, H, AorE, 60) (0) ...