pythonpandasdataframe

How to set a column which suffix name is based on a value in another column


#Column X contains the suffix of one of V* columns. Need to put set column V{X} to 9 if X > 1.  
#But my code created a new column 'VX' instead of updating one of the V* columns
    
import pandas as pd

df = pd.DataFrame({'EMPLID': [12, 13, 14, 15, 16, 17, 18],
    'V1': [2,3,4,50,6,7,8],
    'V2': [3,3,3,3,3,3,3],
    'V3': [7,15,8,9,10,11,12],
    'X': [2,3,1,3,3,1,2]
})

# Expected output:
     
#    EMPLID  V1  V2  V3  X  
#    12       2   9   7  2  
#    13       3   3   9  3 
#    14       4   3   8  1
#    15      50   3   9  3 
#    16       6   3   9  3
#    17       7   3  11  1  
#    18       8   9  12  2 

My code created a new column 'VX' instead of updating one of the V* columns:

df.loc[(df['X'] > 1), f"V{'X'}"] = 9

Any suggestion is appreciated. Thank you.


Solution

  • # your dataframe
    df = pd.DataFrame({'EMPLID': [12, 13, 14, 15, 16, 17, 18],
        'V1': [2,3,4,50,6,7,8],
        'V2': [3,3,3,3,3,3,3],
        'V3': [7,15,8,9,10,11,12],
        'X': [2,3,1,3,3,1,2]
    })
    

    First, we get the columns names that we want to change and their indices on the original dataframe.

    # column name
    x = df['X'][(df['X'] > 1)]
    # column names mapped to your scenario
    columns = [f'V{v}' for v in x]
    # desired indexes
    positions = x.index.values
    #Then we convert the column names to indices and use these indices to update the positions matching the conditions.
    column_indices = [df.columns.get_loc(col) for col in columns]
    

    Now, we can use two approaches here.

    A vectorized approach

    Convert the dataframe to numpy array, change the desired positions all at once and change the result back to a dataframe.

    
    import numpy as np
    
    # the original dataframe column names
    column_names = df.columns
    # convert the dataframe to numpy (this will 'remove' the column names)
    df_array = df.values
    # put the columns back (axis=0 will stack the columns at the top of the array)
    df_array = np.concatenate([[column_names], df_array], axis=0)
    # position+1 because when using pandas to get the row index, we ignore the columns (which would have index 0)
    df_array[positions+1, column_indices] = 9
    # convert the result back to a dataframe
    df = pd.DataFrame(df_array[1:], columns=column_names)
    
    print(df)
    

    Output:

     EMPLID  V1  V2  V3  X
    0      12   2   9   7  2
    1      13   3   3   9  3
    2      14   4   3   8  1
    3      15  50   3   9  3
    4      16   6   3   9  3
    5      17   7   3  11  1
    6      18   8   9  12  2
    

    A loop approach

    The easiest way would be just to loop over the rows and columns, changing one value at a time.

    for row, column in zip(positions, column_indices):
        df.iat[row,column] = 9
    

    If your dataframe is small, the vectorized approach won't have as much of an advantage over the for loop.