pythonpandasdataframemlxtend

slice the dataframe with certain condition


Currently I have dataframe like this:

enter image description here

I want to slice the dataframe by itemsets where it has only two item sets For example, I want the dataframe only with (whole mile, soda) or (soda, Curd) ...

I tried to iterate through the dataframe. But, it seems to be not appropriate way to handle the dataframe.

two_itemsets=[]

for i, j in zip(sorted_itemsets["support"], sorted_itemsets["itemsets"]):
    list=[]
    
    if(len(j) == 2):
        list.append(i)
        list.append(j)
        
        two_itemsets.append(list)
top_itemsets = two_itemsets[:20]
top_df = pd.DataFrame(top_itemsets)
top_df.columns=['support', 'itemsets']
top_df

enter image description here

rules_ap = mlx.frequent_patterns.association_rules(top_df, metric="confidence", min_threshold=0.5)
"frozenset({'whole milk'})You are likely getting this error because the DataFrame is missing  antecedent and/or consequent  information. You can try using the  `support_only=True` option"

Also, using the dataframe to get the apriori rule is not working correctly. When I am creating the dataframe is there anything that I am missing?

I tried support_only=True but it prints nothing.


Solution

  • With len and boolean indexing :

    out = df.loc[df["itemsets"].str.len() == 2]#.reset_index(drop=True)
    

    ā€‹ Output :

    print(out)
    
        support                   itemsets
    5  0.010066     (sausage, frankfurter)
    7  0.010066         (curd, rolls/buns)
    8  0.010066  (napkins, tropical fruit)
    9  0.010066  (hard cheese, whole milk)