Scenario: I have a pandas dataframe. I am trying to use the values in a given column (year) to find the relevant header name and add it to a new column (year_name). For example, if the dataframe looks like this:
itemName | 2020 | 2021 | 2022 | 2023 | 2024 | year |
---|---|---|---|---|---|---|
item1 | 5 | 20 | 10 | 10 | 50 | 3 |
item2 | 10 | 10 | 50 | 20 | 40 | 2 |
item3 | 12 | 35 | 73 | 10 | 54 | 4 |
The result should be like this:
itemName | 2020 | 2021 | 2022 | 2023 | 2024 | year | year_name |
---|---|---|---|---|---|---|---|
item1 | 5 | 20 | 10 | 10 | 50 | 3 | 2022 |
item2 | 10 | 10 | 50 | 20 | 40 | 2 | 2021 |
item3 | 12 | 35 | 73 | 10 | 54 | 4 | 2023 |
Obs. the itemName column is the index.
Issue: I am trying to use a lambda function to use the value of each row of "year" and use it to find the column name for that row and add it to the year_name column.
Function: I tried:
col_names = result_dict[col].columns.tolist()
result_df[[last_year_header']] = result_df[[_last_year']].apply(lambda x: col_names[x])
but this gave me the following error:
TypeError: list indices must be integers or slices, not Series
I also tried:
col_names = result_dict[col].columns.tolist()
result_df[[last_year_header']] = result_df[[_last_year']].apply(lambda x: col_names[x.iloc[0].astype(int)])
But this gave me:
IndexError: list index out of range
Question: I am clearly missing something with the implementation of the lambda function in this case. How can I fix this?
You don't need a lmabda, you should be able to directly index your columns index:
df['year_name'] = df.columns[df['year']-1] #.astype('Int64')
Output:
2020 2021 2022 2023 2024 year year_name
itemName
item1 5 20 10 10 50 3 2022
item2 10 10 50 20 40 2 2021
item3 12 35 73 10 54 4 2023
If there can be invalid values in df['year']
, you could use a Series with reindex
:
df['year_name'] = pd.Series(df.columns).reindex(df['year']-1).values
Example output:
2020 2021 2022 2023 2024 year year_name
itemName
item1 5 20 10 10 50 3.0 2022
item2 10 10 50 20 40 20.0 NaN
item3 12 35 73 10 54 NaN NaN
Reproducible inputs:
# Example 1 (valid values)
df = pd.DataFrame.from_dict({
'index': ['item1', 'item2', 'item3'],
'columns': [2020, 2021, 2022, 2023, 2024, 'year'],
'data': [[5, 20, 10, 10, 50, 3],
[10, 10, 50, 20, 40, 2],
[12, 35, 73, 10, 54, 4],],
'index_names': ['itemName'],
'column_names': [None],
}, 'tight')
# Example 2 (invalid values)
df = pd.DataFrame.from_dict({
'index': ['item1', 'item2', 'item3'],
'columns': [2020, 2021, 2022, 2023, 2024, 'year'],
'data': [[5, 20, 10, 10, 50, 3],
[10, 10, 50, 20, 40, 20],
[12, 35, 73, 10, 54, None],],
'index_names': ['itemName'],
'column_names': [None],
}, 'tight')