I currently have a file where I create a hierarchy from the product and calculate the percentage split based on the previous level.
My code looks like this:
data = [['product1', 'product1a', 'product1aa', 10],
['product1', 'product1a', 'product1aa', 5],
['product1', 'product1a', 'product1aa', 15],
['product1', 'product1a', 'product1ab', 10],
['product1', 'product1a', 'product1ac', 20],
['product1', 'product1b', 'product1ba', 15],
['product1', 'product1b', 'product1bb',15],
['product2', 'product2_a', 'product2_aa', 30]]
df = pd.DataFrame(data, columns = ["Product_level1", "Product_Level2", "Product_Level3", "Qty"])
prod_levels = ["Product_level1", "Product_Level2", "Product_Level3"]
df = df.groupby(prod_levels).sum("Qty")
df["Qty ratio"] = df["Qty"] / df["Qty"].sum(level=prod_levels[-2])
print(df)
This gives me this as a result:
Qty Qty ratio
Product_level1 Product_Level2 Product_Level3
product1 product1a product1aa 30 0.500000
product1ab 10 0.166667
product1ac 20 0.333333
product1b product1ba 15 0.500000
product1bb 15 0.500000
product2 product2_a product2_aa 30 1.000000
According to my version of pandas (1.3.2), I'm getting a FutureWarning that level is deprecated and that I should use a groupby instead.
FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum()
Unfortunately, I cannot seem to figure out what would be the correct syntax to get to the same results using Group by to make sure this will work with futrue versions of Pandas. I've tried variations of what's below but none worked.
df["Qty ratio"] = df.groupby(["Product_level1", "Product_Level2", "Product_Level3"]).sum("Qty") / df.groupby(level=prod_levels[-1]).sum("Qty")
Can anyway suggest how I could approach this?
Thank you
The level
keyword on many functions was deprecated in 1.3. Deprecate: level parameter for aggregations in DataFrame and Series #39983.
The following functions are affected:
The level argument was always rewritten internally to be a groupby
operation. For this reason, to increase clarity and reduce redundancy in the library it was deprecated.
The general pattern is whatever the level arguments passed to the aggregation were, they should be moved to groupby
instead.
Sample Data:
import pandas as pd
df = pd.DataFrame(
{'A': [1, 1, 2, 2],
'B': [1, 2, 1, 2],
'C': [5, 6, 7, 8]}
).set_index(['A', 'B'])
C
A B
1 1 5
2 6
2 1 7
2 8
With aggregate over level
:
df['C'].sum(level='B')
B
1 12
2 14
Name: C, dtype: int64
FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead.
This now becomes groupby
over level
:
df['C'].groupby(level='B').sum()
B
1 12
2 14
Name: C, dtype: int64
In this specific example:
df["Qty ratio"] = df["Qty"] / df["Qty"].sum(level=prod_levels[-2])
Becomes
df["Qty ratio"] = df["Qty"] / df["Qty"].groupby(level=prod_levels[-2]).sum()
*just move the level argument to groupby