Currently I have dataframe like this:
I want to slice the dataframe by itemsets
where it has only two item sets
For example, I want the dataframe only with (whole mile, soda) or (soda, Curd) ...
I tried to iterate through the dataframe. But, it seems to be not appropriate way to handle the dataframe.
two_itemsets=[]
for i, j in zip(sorted_itemsets["support"], sorted_itemsets["itemsets"]):
list=[]
if(len(j) == 2):
list.append(i)
list.append(j)
two_itemsets.append(list)
top_itemsets = two_itemsets[:20]
top_df = pd.DataFrame(top_itemsets)
top_df.columns=['support', 'itemsets']
top_df
rules_ap = mlx.frequent_patterns.association_rules(top_df, metric="confidence", min_threshold=0.5)
"frozenset({'whole milk'})You are likely getting this error because the DataFrame is missing antecedent and/or consequent information. You can try using the `support_only=True` option"
Also, using the dataframe to get the apriori rule is not working correctly. When I am creating the dataframe is there anything that I am missing?
I tried support_only=True
but it prints nothing.
With len
and boolean indexing :
out = df.loc[df["itemsets"].str.len() == 2]#.reset_index(drop=True)
ā Output :
print(out)
support itemsets
5 0.010066 (sausage, frankfurter)
7 0.010066 (curd, rolls/buns)
8 0.010066 (napkins, tropical fruit)
9 0.010066 (hard cheese, whole milk)