pythonpandaspandas-groupby

FutureWarning: Level keyword deprecated in 1.3, use groupby instead


I currently have a file where I create a hierarchy from the product and calculate the percentage split based on the previous level.

My code looks like this:

    data = [['product1',  'product1a', 'product1aa', 10],
        ['product1',  'product1a', 'product1aa', 5],
        ['product1',  'product1a', 'product1aa', 15],
        ['product1',  'product1a', 'product1ab', 10],
        ['product1',  'product1a', 'product1ac', 20],
        ['product1', 'product1b', 'product1ba', 15],
        ['product1', 'product1b', 'product1bb',15],
        ['product2', 'product2_a', 'product2_aa', 30]] 

df = pd.DataFrame(data, columns = ["Product_level1", "Product_Level2", "Product_Level3", "Qty"])
    
prod_levels = ["Product_level1", "Product_Level2", "Product_Level3"]
    
df = df.groupby(prod_levels).sum("Qty")
            
df["Qty ratio"] = df["Qty"] / df["Qty"].sum(level=prod_levels[-2])

print(df)

This gives me this as a result:

                                              Qty  Qty ratio
Product_level1 Product_Level2 Product_Level3
product1       product1a      product1aa       30   0.500000
                              product1ab       10   0.166667
                              product1ac       20   0.333333
               product1b      product1ba       15   0.500000
                              product1bb       15   0.500000
product2       product2_a     product2_aa      30   1.000000

According to my version of pandas (1.3.2), I'm getting a FutureWarning that level is deprecated and that I should use a groupby instead.

FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum()

Unfortunately, I cannot seem to figure out what would be the correct syntax to get to the same results using Group by to make sure this will work with futrue versions of Pandas. I've tried variations of what's below but none worked.

df["Qty ratio"] = df.groupby(["Product_level1", "Product_Level2", "Product_Level3"]).sum("Qty") / df.groupby(level=prod_levels[-1]).sum("Qty")

Can anyway suggest how I could approach this?

Thank you


Solution

  • The level keyword on many functions was deprecated in 1.3. Deprecate: level parameter for aggregations in DataFrame and Series #39983.

    The following functions are affected:

    The level argument was always rewritten internally to be a groupby operation. For this reason, to increase clarity and reduce redundancy in the library it was deprecated.


    The general pattern is whatever the level arguments passed to the aggregation were, they should be moved to groupby instead.

    Sample Data:

    import pandas as pd
    
    df = pd.DataFrame(
        {'A': [1, 1, 2, 2],
         'B': [1, 2, 1, 2],
         'C': [5, 6, 7, 8]}
    ).set_index(['A', 'B'])
    
         C
    A B   
    1 1  5
      2  6
    2 1  7
      2  8
    

    With aggregate over level:

    df['C'].sum(level='B')
    
    B
    1    12
    2    14
    Name: C, dtype: int64
    
    FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead.
    

    This now becomes groupby over level:

    df['C'].groupby(level='B').sum()
    
    B
    1    12
    2    14
    Name: C, dtype: int64
    

    In this specific example:

    df["Qty ratio"] = df["Qty"] / df["Qty"].sum(level=prod_levels[-2])
    

    Becomes

    df["Qty ratio"] = df["Qty"] / df["Qty"].groupby(level=prod_levels[-2]).sum()
    

    *just move the level argument to groupby