pythondataframe

How to specify column name with the suffix based on another column value


#Column X contains the suffix of one of V* columns. Need to put the value from V(X) in column Y.

import pandas as pd
import numpy as np

# sample dataframes
df = pd.DataFrame({
    'EMPLID': [12, 13, 14, 15, 16, 17, 18],
    'V1': [2,3,4,50,6,7,8],
    'V2': [3,3,3,3,3,3,3],
    'V3': [7,15,8,9,10,11,12],
    'X': [2,3,1,3,3,1,2]
})

# Expected output:

   EMPLID  V1  V2  V3  X  Y  
0      12   2   3   7  2  3
1      13   3   3  15  3  15
2      14   4   3   8  1  4
3      15  50   3   9  3  9
4      16   6   3  10  3  10
5      17   7   3  11  1  7
6      18   8   3  12  2  3

Example code I've tried (all got syntax error): df['Y'] = df[f"V + df['X']"]

Any suggestion is appreciated. Thank you.


Solution

  • The canonical way would be to use indexing lookup, however since the column names and X values are a bit different, you first need to convert to string and prepend V:

    idx, cols = pd.factorize('V'+df['X'].astype(str))
    
    df['Y'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
    

    Output:

       EMPLID  V1  V2  V3  X   Y
    0      12   2   3   7  2   3
    1      13   3   3  15  3  15
    2      14   4   3   8  1   4
    3      15  50   3   9  3   9
    4      16   6   3  10  3  10
    5      17   7   3  11  1   7
    6      18   8   3  12  2   3