I have two datasets that look like this:
df1:
| Date | City | State | Quantity |
|---|---|---|---|
| 2019-01 | Chicago | IL | 35 |
| 2019-01 | Orlando | FL | 322 |
| ... | .... | ... | ... |
| 2021-07 | Chicago | IL | 334 |
| 2021-07 | Orlando | FL | 4332 |
df2:
| Date | City | State | Sales |
|---|---|---|---|
| 2020-03 | Chicago | IL | 30 |
| 2020-03 | Orlando | FL | 319 |
| ... | ... | ... | ... |
| 2021-07 | Chicago | IL | 331 |
| 2021-07 | Orlando | FL | 4000 |
My date is in format period[M] for both datasets. I have tried using the df1.join(df2,how='outer') and (df2.join(df1,how='outer') commands but they don't add up correctly, essentially, in 2019-01, I have sales for 2020-03. How can I join these two datasets such that my output is as follows:
I have not been able to use merge() because I would have to merge with a combination of City and State and Date
| Date | City | State | Quantity | Sales |
|---|---|---|---|---|
| 2019-01 | Chicago | IL | 35 | NaN |
| 2019-01 | Orlando | FL | 322 | NaN |
| ... | ... | ... | ... | ... |
| 2021-07 | Chicago | IL | 334 | 331 |
| 2021-07 | Orlando | FL | 4332 | 4000 |
You can outer-merge. By not specifying the columns to merge on, you merge on the intersection of the columns in both DataFrames (in this case, Date, City and State).
out = df1.merge(df2, how='outer').sort_values(by='Date')
Output:
Date City State Quantity Sales
0 2019-01 Chicago IL 35.0 NaN
1 2019-01 Orlando FL 322.0 NaN
4 2020-03 Chicago IL NaN 30.0
5 2020-03 Orlando FL NaN 319.0
2 2021-07 Chicago IL 334.0 331.0
3 2021-07 Orlando FL 4332.0 4000.0