I have been trying to implement the Apriori Algorithm in python. There are several examples online, they all use similar methods and mostly the same example dataset. The reference link: https://www.kaggle.com/code/rockystats/apriori-algorithm-or-market-basket-analysis/notebook (starting from the line [26])
I have a different dataset that has the same structure as the example datasets online. I keep getting the
"DeprecationWarning: DataFrames with non-bool types result in worse computationalperformance and their support might be discontinued in the future.Please use a DataFrame with bool type"
error.
Here is my code:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules
df1 = pd.read_csv(r'C:\Users\USER\dataset', sep=';')
df=df1.fillna(0)
basket = pd.pivot_table(data=df, index='cust_id', columns='Product', values='quantity', aggfunc='count',fill_value=0.0)
def convert_into_binary(x):
if x > 0:
return 1
else:
return 0
basket_sets = basket.applymap(convert_into_binary)
frequent_itemsets = apriori(basket_sets, min_support=0.07, use_colnames=True)
print(frequent_itemsets)
# association rule
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
print(rules)
In addition, in the last step of my code, I get an empty dataframe; I can see the column headings of the dataset but the output is empty.
Empty DataFrame Columns: [antecedents, consequents, antecedent support, consequent support, support, confidence, lift, leverage, conviction] Index: []
I am not sure if this issue is related to this error that I am having. I am new to python and I would really appreciate assistance and support on this issue.
I ran into the same issue even after converting my dataframe fields to 0 and 1.
The fix was just making sure the apriori module knows the dataframe is of boolean type, so in your case you should run this :
frequent_itemsets = apriori(basket_sets.astype('bool'), min_support=0.07, use_colnames=True)
In addition, in the last step of my code, I get an empty dataframe; I can see the column headings of the dataset but the output is empty.
Try using a smaller min_support