I guess this must be rather simple, but I'm struggling to find the easy way of doing it.
I have a pandas DataFrame with the columns A to D and need to copy some of the columns to new ones. The trick is that it not just involves renaming, I also need to duplicate the values to new columns as well.
Here is an example of the input:
import pandas as pd
df = pd.DataFrame({
'A': [1,2,3],
'B':['2025-10-01', '2025-10-02', '2025-10-01'],
'C': ['2025-02-10', '2025-02-15', '2025-02-20'],
'D': [0, 5, 4],
'values': [52.3, 60, 70.6]
})
mapping_dict = {
'table_1': {
'id': 'A',
'dt_start': 'B',
'dt_end': 'B',
},
'table_2': {
'id': 'D',
'dt_start': 'C',
'dt_end': 'C',
},
}
I'd like to have as output for table_1
a DataFrame as follows:
id | dt_start | dt_end | values |
---|---|---|---|
1 | 2025-10-01 | 2025-10-01 | 52.3 |
2 | 2025-10-02 | 2025-10-02 | 60 |
3 | 2025-10-01 | 2025-10-01 | 80.6 |
And I guess it is possible to infer the expected output for table_2
.
Note that the column values
, which is not included in the mapping logic, should remain in the dataframe.
I was able to achieve this by using a for loop, but I feel that should be a natural way of doing this directly on pandas without manually looping over the mapping dict and then dropping the extra columns.
Here is my solution so far:
table_name = 'table_1'
new_df = df.copy()
for new_col, old_col in mapping_dict[table_name].items():
new_df[new_col] = df[old_col]
new_df = new_df.drop(mapping_dict[table_name].values(), axis='columns')
Any help or suggestion will be appreciated!
IIUC, you can do this with this command, this is one of the reason I like to use the set_axis
method in dataframes.
table_name = 'table_1'
df[list(mapping_dict[table_name].values())+['values']].set_axis(list(mapping_dict[table_name].keys())+['values'], axis=1)
Output:
id dt_start dt_end values
0 1 2025-10-01 2025-10-01 52.3
1 2 2025-10-02 2025-10-02 60.0
2 3 2025-10-01 2025-10-01 70.6
Or,
table_name = 'table_2'
df[list(mapping_dict[table_name].values())+['values']].set_axis(list(mapping_dict[table_name].keys())+['values'], axis=1)
Output:
id dt_start dt_end values
0 0 2025-02-10 2025-02-10 52.3
1 5 2025-02-15 2025-02-15 60.0
2 4 2025-02-20 2025-02-20 70.6
And, much like @khushalvaland you can create function for resuse:
def gen_table(df, mapping, cols) -> pd.DataFrame:
return (df[list(mapping.values())+cols]
.set_axis(list(mapping.keys())+cols,
axis=1))
gen_table(df, mapping = mapping_dict['table_1'], cols =['values'])
gen_table(df, mapping = mapping_dict['table_2'], cols =['values'])
Output:
id dt_start dt_end values
0 1 2025-10-01 2025-10-01 52.3
1 2 2025-10-02 2025-10-02 60.0
2 3 2025-10-01 2025-10-01 70.6
and
id dt_start dt_end values
0 0 2025-02-10 2025-02-10 52.3
1 5 2025-02-15 2025-02-15 60.0
2 4 2025-02-20 2025-02-20 70.6