So basically I have 2 DataFrames like this:
Table_1
Apple | Banana | Orange | Date |
---|---|---|---|
1 | 2 | 4 | 2020 |
3 | 5 | 2 | 2021 |
7 | 8 | 9 | 2022 |
Table_2
fruit | year |
---|---|
Apple | 2020 |
Apple | 2021 |
Apple | 2022 |
Banana | 2020 |
Banana | 2021 |
Banana | 2022 |
Orange | 2020 |
Orange | 2021 |
Orange | 2022 |
So I want to lookup the values for the fruits for Table_2 from the Table_1 based on the fruit name and the respective year.
The final outcome should look like this:
fruit | year | number |
---|---|---|
Apple | 2020 | 1 |
Apple | 2021 | 3 |
Apple | 2022 | 7 |
Banana | 2020 | 2 |
Banana | 2021 | 5 |
Banana | 2022 | 8 |
Orange | 2020 | 4 |
Orange | 2021 | 2 |
Orange | 2022 | 9 |
In the Excel for an example one can do something like this:
=INDEX(Table1[[Apple]:[Orange]],MATCH([@year],Table1[Date],0),MATCH([@fruit],Table1[[#Headers],[Apple]:[Orange]],0))
But what is the way to do it in Python?
Assuming pandas, you can melt
and merge
:
out = (df2
.merge(df1.rename(columns={'Date': 'year'})
.melt('year', var_name='fruit', value_name='number'),
how='left'
)
)
output:
fruit year number
0 Apple 2020 1
1 Apple 2021 3
2 Apple 2022 7
3 Banana 2020 2
4 Banana 2021 5
5 Banana 2022 8
6 Orange 2020 4
7 Orange 2021 2
8 Orange 2022 9