pythonpandasdataframe

Duplicate and rename columns on pandas DataFrame


I guess this must be rather simple, but I'm struggling to find the easy way of doing it.

I have a pandas DataFrame with the columns A to D and need to copy some of the columns to new ones. The trick is that it not just involves renaming, I also need to duplicate the values to new columns as well.

Here is an example of the input:

import pandas as pd

df = pd.DataFrame({
    'A': [1,2,3], 
    'B':['2025-10-01', '2025-10-02', '2025-10-01'], 
    'C': ['2025-02-10', '2025-02-15', '2025-02-20'], 
    'D': [0, 5, 4],
    'values': [52.3, 60, 70.6]
})

mapping_dict = {
    'table_1': {
        'id': 'A',
        'dt_start': 'B',
        'dt_end': 'B',
    },
    'table_2': {
        'id': 'D',
        'dt_start': 'C',
        'dt_end': 'C',
    },
}

I'd like to have as output for table_1 a DataFrame as follows:

id dt_start dt_end values
1 2025-10-01 2025-10-01 52.3
2 2025-10-02 2025-10-02 60
3 2025-10-01 2025-10-01 80.6

And I guess it is possible to infer the expected output for table_2.

Note that the column values, which is not included in the mapping logic, should remain in the dataframe.

I was able to achieve this by using a for loop, but I feel that should be a natural way of doing this directly on pandas without manually looping over the mapping dict and then dropping the extra columns.

Here is my solution so far:

table_name = 'table_1'

new_df = df.copy()
for new_col, old_col in mapping_dict[table_name].items():
    new_df[new_col] = df[old_col]

new_df = new_df.drop(mapping_dict[table_name].values(), axis='columns')

Any help or suggestion will be appreciated!


Solution

  • IIUC, you can do this with this command, this is one of the reason I like to use the set_axis method in dataframes.

    table_name = 'table_1'
    df[list(mapping_dict[table_name].values())+['values']].set_axis(list(mapping_dict[table_name].keys())+['values'], axis=1)
    

    Output:

       id    dt_start      dt_end  values
    0   1  2025-10-01  2025-10-01    52.3
    1   2  2025-10-02  2025-10-02    60.0
    2   3  2025-10-01  2025-10-01    70.6
    
    

    Or,

    table_name = 'table_2'
    df[list(mapping_dict[table_name].values())+['values']].set_axis(list(mapping_dict[table_name].keys())+['values'], axis=1)
    

    Output:

       id    dt_start      dt_end  values
    0   0  2025-02-10  2025-02-10    52.3
    1   5  2025-02-15  2025-02-15    60.0
    2   4  2025-02-20  2025-02-20    70.6
    

    And, much like @khushalvaland you can create function for resuse:

    def gen_table(df, mapping, cols) -> pd.DataFrame:
        return (df[list(mapping.values())+cols]
                  .set_axis(list(mapping.keys())+cols, 
                            axis=1))
    
    gen_table(df, mapping = mapping_dict['table_1'], cols =['values'])
    gen_table(df, mapping = mapping_dict['table_2'], cols =['values'])
    

    Output:

       id    dt_start      dt_end  values
    0   1  2025-10-01  2025-10-01    52.3
    1   2  2025-10-02  2025-10-02    60.0
    2   3  2025-10-01  2025-10-01    70.6
    

    and

       id    dt_start      dt_end  values
    0   0  2025-02-10  2025-02-10    52.3
    1   5  2025-02-15  2025-02-15    60.0
    2   4  2025-02-20  2025-02-20    70.6