I have 2 dataframes that look like this:
import pandas as pd
data = {'QuarterYear': ["Q3 2023", "Q4 2023", "Q1 2024", 'Q2 2024', "Q3 2024", "Q4 2024"], 'data1': [5, 6, 2, 1, 10, 3], 'data2': [12, 4, 2, 7, 2, 9], 'data3': [2, 42, 2, 6, 2, 4]}
df = pd.DataFrame(data)
This looks like:
QuarterYear data1 data2 data3
0 Q3 2023 5 12 2
1 Q4 2023 6 4 42
2 Q1 2024 2 2 2
3 Q2 2024 1 7 6
4 Q3 2024 10 2 2
5 Q4 2024 3 9 4
data1 = {'QuarterYear': ["Q4 2023", 'Q2 2024', "Q3 2024"], 'data1': [5, 9, 10], 'data2': [7, 7, 3], 'data3': [2, 11, 3]}
df1 = pd.DataFrame(data1)
This looks like:
QuarterYear data1 data2 data3
0 Q4 2023 5 7 2
1 Q2 2024 9 7 11
2 Q3 2024 10 3 3
What I would like to do is get df1 and make it the same size as df1, that is fill in for the other quarters with values for data1,2, and 3 being 0 if they are not already in df1.
So the end result should look like this:
QuarterYear data1 data2 data3
0 Q3 2023 0 0 0
1 Q4 2023 5 7 2
2 Q1 2024 0 0 0
3 Q2 2024 9 7 11
4 Q3 2024 10 3 3
5 Q4 2024 0 0 0
Use DataFrame.set_index
with DataFrame.reindex
:
out = (df1.set_index('QuarterYear')
.reindex(df['QuarterYear'], fill_value=0)
.reset_index())
print (out)
QuarterYear data1 data2 data3
0 Q3 2023 0 0 0
1 Q4 2023 5 7 2
2 Q1 2024 0 0 0
3 Q2 2024 9 7 11
4 Q3 2024 10 3 3
5 Q4 2024 0 0 0
Another idea:
out = df1.merge(df[['QuarterYear']], how='right').fillna(0)
print (out)
QuarterYear data1 data2 data3
0 Q3 2023 0.0 0.0 0.0
1 Q4 2023 5.0 7.0 2.0
2 Q1 2024 0.0 0.0 0.0
3 Q2 2024 9.0 7.0 11.0
4 Q3 2024 10.0 3.0 3.0
5 Q4 2024 0.0 0.0 0.0
Thank you @ouroboros1 for comment:
In both cases, you're assuming that df['QuarterYear'] at least contains all values in df1['QuarterYear']. In OP's example, that the case, but if it were not, both methods will actually filter out rows from df1. That does not seem to be what the OP wants. E.g. change "Q3 2024" to "Q3 2025" for df, and "Q3 2024" will be gone from the result. So, with reindex, you would in that case need a union, with merge outer.
out = df1.merge(df[['QuarterYear']], how='outer').fillna(0)
print (out)
QuarterYear data1 data2 data3
0 Q1 2024 0.0 0.0 0.0
1 Q2 2024 9.0 7.0 11.0
2 Q3 2023 0.0 0.0 0.0
3 Q3 2024 0.0 0.0 0.0
4 Q3 2025 10.0 3.0 3.0
5 Q4 2023 5.0 7.0 2.0
6 Q4 2024 0.0 0.0 0.0
out = (df1.set_index('QuarterYear')
.reindex(pd.Index(df['QuarterYear']).union(df1['QuarterYear']), fill_value=0)
.reset_index())
print (out)
QuarterYear data1 data2 data3
0 Q1 2024 0 0 0
1 Q2 2024 9 7 11
2 Q3 2023 0 0 0
3 Q3 2024 0 0 0
4 Q3 2025 10 3 3
5 Q4 2023 5 7 2
6 Q4 2024 0 0 0
If necessary, here is solution also for sorting by quarter
s:
out = (df1.set_index('QuarterYear')
.reindex(pd.Index(df['QuarterYear']).union(df1['QuarterYear']), fill_value=0)
.reset_index()
.sort_values('QuarterYear',
key=lambda x: pd.to_datetime(x.str[-4:] +
x.str[:2], format='mixed'),
ignore_index=True))
print (out)
QuarterYear data1 data2 data3
0 Q3 2023 0 0 0
1 Q4 2023 5 7 2
2 Q1 2024 0 0 0
3 Q2 2024 9 7 11
4 Q3 2024 0 0 0
5 Q4 2024 0 0 0
6 Q3 2025 10 3 3