Imagine following dataframe is given.
import pandas as pd
products = ['Apple', 'Apple', 'Carrot', 'Eggplant', 'Eggplant']
customer_demand_date = ['2023-01-01', '2023-01-07', '2023-01-01', '2023-01-01', '2023-01-07', '2023-01-14']
col_02_2023 = [0, 20, 0, 0, 0, 10]
col_03_2023 = [20, 30, 10, 0, 10, 0]
col_04_2023 = [10, 40, 50, 30, 40, 10]
col_05_2023 = [40, 40, 60, 50, 60, 20]
data = {'Products': products,
'customer_demand_date': customer_demand_date,
'02_2023': col_02_2023,
'03_2023': col_03_2023,
'04_2023': col_04_2023,
'05_2023': col_05_2023}
df = pd.DataFrame(data)
print(df)
Products customer_demand_date 02_2023 03_2023 04_2023 05_2023
0 Apple 2023-01-01 0 20 10 40
1 Apple 2023-01-07 20 30 40 40
2 Carrot 2023-01-01 0 10 50 60
3 Egg 2023-01-01 0 0 30 50
4 Egg 2023-01-07 0 10 40 60
5 Egg 2023-01-14 0 0 10 20
I have columns products, custome_demand_date (every week there is new customer demand for products per upcoming months) and months with quantity demand. How can I determine which product has experienced the most frequent changes in customer demand over the months, and sort the products in descending order of frequency of change? I have tried to group by product, accumulate the demand quantity but none of them can analyze the data both horizontally (per customer demand date) and vertically (per months). Desired output:
Sorted products Ranking(or %, or count of changes)
Egg 1 (or 70% or 13)
Apple 2 (or 52% or 8)
Carrot 3 (22% or 3)
Either ranking or % of change frequency or count of changes.
I'd really appreciate if you have any clever approach to solve this problem? Thanks
One way is to define a function that counts horizontal and vertical changes which you can apply to each group individually.
import pandas as pd
from io import StringIO
def change_freq(x, months):
# count horizontal changes
chngs_horizontal = x[months].diff(axis=1).fillna(0).astype(bool).sum().sum()
# count vertical changes
chngs_vertical = x[months].diff(axis=0).fillna(0).astype(bool).sum().sum()
return chngs_horizontal+chngs_vertical
# sample data
data = StringIO("""
Products,customer_demand_date,02_2023,03_2023,04_2023,05_2023
Apple,2023-01-01,0,20,10,40
Apple,2023-01-07,20,30,40,40
Carrot,2023-01-01,0,10,50,60
Egg,2023-01-01,0,0,30,50
Egg,2023-01-07,0,10,40,60
Egg,2023-01-14,0,0,10,20
""")
df = pd.read_csv(data, sep=",")
# count horizontal and vertical changes by product
result = df.groupby('Products').apply(change_freq, ['02_2023','03_2023','04_2023','05_2023'])
result = result.sort_values(ascending=False).to_frame('count_changes')
result['rank'] = result['count_changes'].rank(ascending=False)
This returns
count_changes rank
Products
Egg 13 1.0
Apple 8 2.0
Carrot 3 3.0