I have two datasets that look like this:
df1
:
Date | City | State | Quantity |
---|---|---|---|
2019-01 | Chicago | IL | 35 |
2019-01 | Orlando | FL | 322 |
... | .... | ... | ... |
2021-07 | Chicago | IL | 334 |
2021-07 | Orlando | FL | 4332 |
df2
:
Date | City | State | Sales |
---|---|---|---|
2020-03 | Chicago | IL | 30 |
2020-03 | Orlando | FL | 319 |
... | ... | ... | ... |
2021-07 | Chicago | IL | 331 |
2021-07 | Orlando | FL | 4000 |
My date
is in format period[M]
for both datasets. I have tried using the df1.join(df2,how='outer')
and (df2.join(df1,how='outer')
commands but they don't add up correctly, essentially, in 2019-01
, I have sales for 2020-03
. How can I join these two datasets such that my output is as follows:
I have not been able to use merge()
because I would have to merge with a combination of City
and State
and Date
Date | City | State | Quantity | Sales |
---|---|---|---|---|
2019-01 | Chicago | IL | 35 | NaN |
2019-01 | Orlando | FL | 322 | NaN |
... | ... | ... | ... | ... |
2021-07 | Chicago | IL | 334 | 331 |
2021-07 | Orlando | FL | 4332 | 4000 |
You can outer-merge
. By not specifying the columns to merge on, you merge on the intersection of the columns in both DataFrames (in this case, Date
, City
and State
).
out = df1.merge(df2, how='outer').sort_values(by='Date')
Output:
Date City State Quantity Sales
0 2019-01 Chicago IL 35.0 NaN
1 2019-01 Orlando FL 322.0 NaN
4 2020-03 Chicago IL NaN 30.0
5 2020-03 Orlando FL NaN 319.0
2 2021-07 Chicago IL 334.0 331.0
3 2021-07 Orlando FL 4332.0 4000.0