I am doing market basket analysis on a xlsx dataset of a shoe store My Data set is like
ReceiptID | ItemName | Category | NetQty
123 | Black Shoe | Shoes | 5
123 | Brown Shoe | Shoes | 4
125 | Sandal | Sandal | 2
125 | Peshawari | Shoes | 1
I want my data to be like
ReceiptID | ItemName | Category | NetQty
123 | Black Shoe, Brown Shoe | Shoes | 9
125 | Sandal, Peshawari | Sandal, Shoes| 3
def new_list(items):
Row_list=[ ]
for rows in items.ReceiptID:
if items.ReceiptD==ReceiptID-1:
my_list=[items.ItemName]
Row_list.append(my_list)
return(Row_list)
I want a function in python that does this if i load the dataframe in it. Thanks :)
IIUC you need just groupby
+ agg
as followng
df = pd.DataFrame({"ReceiptID": [123,123,125,125],
"ItemName": ["Black Shoe", "Broen Shoe", "Sandal", "Peshawari"],
"Category": ["Shoes", "Shoes", "Sandal" ,"Shoes"],
"NetQty": [5,4,2,1]})
out = df.groupby("ReceiptID")\
.agg({"ItemName":"unique",
"Category":"unique",
"NetQty":"sum"})\
.reset_index()
ReceiptID ItemName Category NetQty
0 123 [Black Shoe, Broen Shoe] [Shoes] 9
1 125 [Sandal, Peshawari] [Sandal, Shoes] 3
If you want just strings instead of lists you can add these two lines
for col in ["ItemName", "Category"]:
out[col] = out[col].str.join(", ")
ReceiptID ItemName Category NetQty
0 123 Black Shoe, Broen Shoe Shoes 9
1 125 Sandal, Peshawari Sandal, Shoes 3