pythonpandasdataframe

Fill in rows to dataframe based on another dataframe


I have 2 dataframes that look like this:

import pandas as pd

data = {'QuarterYear': ["Q3 2023", "Q4 2023", "Q1 2024", 'Q2 2024', "Q3 2024", "Q4 2024"], 'data1': [5, 6, 2, 1, 10, 3], 'data2': [12, 4, 2, 7, 2, 9], 'data3': [2, 42, 2, 6, 2, 4]}
df = pd.DataFrame(data)

This looks like:

  QuarterYear  data1  data2  data3
0     Q3 2023      5     12      2
1     Q4 2023      6      4     42
2     Q1 2024      2      2      2
3     Q2 2024      1      7      6
4     Q3 2024     10      2      2
5     Q4 2024      3      9      4

data1 = {'QuarterYear': ["Q4 2023", 'Q2 2024', "Q3 2024"], 'data1': [5, 9, 10], 'data2': [7, 7, 3], 'data3': [2, 11, 3]}
df1 = pd.DataFrame(data1)

This looks like:
  QuarterYear  data1  data2  data3
0     Q4 2023      5      7      2
1     Q2 2024      9      7     11
2     Q3 2024     10      3      3

What I would like to do is get df1 and make it the same size as df1, that is fill in for the other quarters with values for data1,2, and 3 being 0 if they are not already in df1.

So the end result should look like this:

  QuarterYear  data1  data2  data3
0     Q3 2023      0      0      0
1     Q4 2023      5      7      2
2     Q1 2024      0      0      0
3     Q2 2024      9      7     11
4     Q3 2024     10      3      3
5     Q4 2024      0      0      0

Solution

  • Use DataFrame.set_index with DataFrame.reindex:

    out = (df1.set_index('QuarterYear')
              .reindex(df['QuarterYear'], fill_value=0)
              .reset_index())
    print (out)
      QuarterYear  data1  data2  data3
    0     Q3 2023      0      0      0
    1     Q4 2023      5      7      2
    2     Q1 2024      0      0      0
    3     Q2 2024      9      7     11
    4     Q3 2024     10      3      3
    5     Q4 2024      0      0      0
    

    Another idea:

    out = df1.merge(df[['QuarterYear']], how='right').fillna(0)
    print (out)
      QuarterYear  data1  data2  data3
    0     Q3 2023    0.0    0.0    0.0
    1     Q4 2023    5.0    7.0    2.0
    2     Q1 2024    0.0    0.0    0.0
    3     Q2 2024    9.0    7.0   11.0
    4     Q3 2024   10.0    3.0    3.0
    5     Q4 2024    0.0    0.0    0.0
    

    Thank you @ouroboros1 for comment:

    In both cases, you're assuming that df['QuarterYear'] at least contains all values in df1['QuarterYear']. In OP's example, that the case, but if it were not, both methods will actually filter out rows from df1. That does not seem to be what the OP wants. E.g. change "Q3 2024" to "Q3 2025" for df, and "Q3 2024" will be gone from the result. So, with reindex, you would in that case need a union, with merge outer.

    out = df1.merge(df[['QuarterYear']], how='outer').fillna(0)
    print (out)
      QuarterYear  data1  data2  data3
    0     Q1 2024    0.0    0.0    0.0
    1     Q2 2024    9.0    7.0   11.0
    2     Q3 2023    0.0    0.0    0.0
    3     Q3 2024    0.0    0.0    0.0
    4     Q3 2025   10.0    3.0    3.0
    5     Q4 2023    5.0    7.0    2.0
    6     Q4 2024    0.0    0.0    0.0
    
    out = (df1.set_index('QuarterYear')
              .reindex(pd.Index(df['QuarterYear']).union(df1['QuarterYear']), fill_value=0)
              .reset_index())
    print (out)
      QuarterYear  data1  data2  data3
    0     Q1 2024      0      0      0
    1     Q2 2024      9      7     11
    2     Q3 2023      0      0      0
    3     Q3 2024      0      0      0
    4     Q3 2025     10      3      3
    5     Q4 2023      5      7      2
    6     Q4 2024      0      0      0
    

    If necessary, here is solution also for sorting by quarters:

    out = (df1.set_index('QuarterYear')
              .reindex(pd.Index(df['QuarterYear']).union(df1['QuarterYear']), fill_value=0)
              .reset_index()
              .sort_values('QuarterYear', 
                            key=lambda x: pd.to_datetime(x.str[-4:] +
                                                         x.str[:2], format='mixed'), 
                            ignore_index=True))
    print (out)
      QuarterYear  data1  data2  data3
    0     Q3 2023      0      0      0
    1     Q4 2023      5      7      2
    2     Q1 2024      0      0      0
    3     Q2 2024      9      7     11
    4     Q3 2024      0      0      0
    5     Q4 2024      0      0      0
    6     Q3 2025     10      3      3