pythonpandasdataframegroup-byapply

Adding a new row to each group's last row in a dataframe


My original dataframe is as below:

List = [['2024-05-25', 'Group 1', 'Year 1', 23466882], ['2024-05-25', 'Group 1', 'Year 2', 458397284], ['2024-05-25', 'Group 1', 'Year 3', 2344545], ['2024-05-25', 'Group 2', 'Year 1', 6662345], ['2024-05-25', 'Group 2', 'Year 2', 46342], ['2024-05-25', 'Group 3', 'Year 1', 34234], ['2024-05-25', 'Group 3', 'Year 2', 45222]]
df = pd.DataFrame(List, columns = ['Report_date', 'Product_group', 'Year', 'Sales'])

enter image description here

For each product group, if "Year 3" does not exist, a new row with sales of 11 000 should be added to the end.

The output should look like this:

enter image description here

My initial idea is to split the dataframe into each product group and add a new row if the sub- dataframe does not have any info for Year 3 but that approach does not seem to be optimal.

Any comment is appreciated. Thank you in advance!


Solution

  • If need only add missing Year Year 3 for each group use pd.concat with filtered rows with first non exist groups with added new Year and Sales values:

    Notice: This solution only added new rows for not exist Year 3, also working if not exist same years for any group. E.g. if remove first row, so Year 1 is missing.

    g = df.loc[df['Year'].eq('Year 3'), 'Product_group']
    
    out = (pd.concat([df, 
                      df.loc[~df['Product_group'].isin(g)]
                        .drop_duplicates('Product_group').assign(Year='Year 3', Sales=11000)])
              .sort_values(['Product_group','Year'], ignore_index=True))
    print (out)
      Report_date Product_group    Year      Sales
    0  2024-05-25       Group 1  Year 1   23466882
    1  2024-05-25       Group 1  Year 2  458397284
    2  2024-05-25       Group 1  Year 3    2344545
    3  2024-05-25       Group 2  Year 1    6662345
    4  2024-05-25       Group 2  Year 2      46342
    5  2024-05-25       Group 2  Year 3      11000
    6  2024-05-25       Group 3  Year 1      34234
    7  2024-05-25       Group 3  Year 2      45222
    8  2024-05-25       Group 3  Year 3      11000