[SOLVED] Market basket analysis in python

Market basket analysis in python

I am doing market basket analysis on a xlsx dataset of a shoe store My Data set is like

ReceiptID | ItemName | Category | NetQty

123 | Black Shoe | Shoes | 5

123 | Brown Shoe | Shoes | 4

125 | Sandal | Sandal | 2

125 | Peshawari | Shoes | 1

I want my data to be like

ReceiptID | ItemName | Category | NetQty

123 | Black Shoe, Brown Shoe | Shoes | 9

125 | Sandal, Peshawari | Sandal, Shoes| 3

def new_list(items): 
   Row_list=[ ] 
   for rows in items.ReceiptID: 
     if items.ReceiptD==ReceiptID-1: 
       my_list=[items.ItemName] 
       Row_list.append(my_list) 
     return(Row_list)

I want a function in python that does this if i load the dataframe in it. Thanks :)

Solution

IIUC you need just groupby + agg as followng

df = pd.DataFrame({"ReceiptID": [123,123,125,125],
                   "ItemName": ["Black Shoe", "Broen Shoe", "Sandal", "Peshawari"],
                   "Category": ["Shoes", "Shoes", "Sandal" ,"Shoes"],
                   "NetQty": [5,4,2,1]})


out = df.groupby("ReceiptID")\
        .agg({"ItemName":"unique",
              "Category":"unique",
              "NetQty":"sum"})\
        .reset_index()

   ReceiptID                  ItemName         Category  NetQty
0        123  [Black Shoe, Broen Shoe]          [Shoes]       9
1        125       [Sandal, Peshawari]  [Sandal, Shoes]       3

If you want just strings instead of lists you can add these two lines

for col in ["ItemName", "Category"]:
    out[col] = out[col].str.join(", ")

   ReceiptID                ItemName       Category  NetQty
0        123  Black Shoe, Broen Shoe          Shoes       9
1        125       Sandal, Peshawari  Sandal, Shoes       3