pythonmarket-basket-analysis

Market basket analysis in python


I am doing market basket analysis on a xlsx dataset of a shoe store My Data set is like

ReceiptID | ItemName | Category | NetQty

123 | Black Shoe | Shoes | 5

123 | Brown Shoe | Shoes | 4

125 | Sandal | Sandal | 2

125 | Peshawari | Shoes | 1

I want my data to be like

ReceiptID | ItemName | Category | NetQty

123 | Black Shoe, Brown Shoe | Shoes | 9

125 | Sandal, Peshawari | Sandal, Shoes| 3

def new_list(items): 
   Row_list=[ ] 
   for rows in items.ReceiptID: 
     if items.ReceiptD==ReceiptID-1: 
       my_list=[items.ItemName] 
       Row_list.append(my_list) 
     return(Row_list) 

I want a function in python that does this if i load the dataframe in it. Thanks :)


Solution

  • IIUC you need just groupby + agg as followng

    df = pd.DataFrame({"ReceiptID": [123,123,125,125],
                       "ItemName": ["Black Shoe", "Broen Shoe", "Sandal", "Peshawari"],
                       "Category": ["Shoes", "Shoes", "Sandal" ,"Shoes"],
                       "NetQty": [5,4,2,1]})
    
    
    out = df.groupby("ReceiptID")\
            .agg({"ItemName":"unique",
                  "Category":"unique",
                  "NetQty":"sum"})\
            .reset_index()
    
       ReceiptID                  ItemName         Category  NetQty
    0        123  [Black Shoe, Broen Shoe]          [Shoes]       9
    1        125       [Sandal, Peshawari]  [Sandal, Shoes]       3
    

    If you want just strings instead of lists you can add these two lines

    for col in ["ItemName", "Category"]:
        out[col] = out[col].str.join(", ")
    
       ReceiptID                ItemName       Category  NetQty
    0        123  Black Shoe, Broen Shoe          Shoes       9
    1        125       Sandal, Peshawari  Sandal, Shoes       3