pythonpandas

How to use the condition for multiple columns in Pandas?


I want to replace values in columns from P1-6 to P1-95 with values from column family if the value in a particular column > 0. Here is the part of my input df te_matrix:

     family   P1-6  P1-12  P1-22  P1-25  P1-26  P1-28  P1-88  P1-89  P1-90  P1-92  P1-93  P1-95
 Tourist|20  0.712  0.587  0.838  0.675  0.771  0.952  0.746  0.678  0.603  0.728  0.680  0.759
Stowaway|36  0.740  0.809  0.796  0.699  1.000  0.865  0.814  0.888  0.705  0.758  0.759  0.866
 Stowaway|4  0.000  0.000  0.000  0.000  0.000  0.000  0.880  0.000  0.000  0.000  0.000  0.000

I write it for one column, P1-6:

te_matrix['P1-6'] = te_matrix.apply(lambda row: row['family'] if row['P1-6'] > 0 else row['P1-6'], axis=1)

and works well:

     family         P1-6  P1-12  P1-22  P1-25  P1-26  P1-28  P1-88  P1-89  P1-90  P1-92  P1-93  P1-95
 Tourist|20   Tourist|20  0.587  0.838  0.675  0.771  0.952  0.746  0.678  0.603  0.728  0.680  0.759
Stowaway|36  Stowaway|36  0.809  0.796  0.699  1.000  0.865  0.814  0.888  0.705  0.758  0.759  0.866
 Stowaway|4        0.000  0.000  0.000  0.000  0.000  0.000  0.880  0.000  0.000  0.000  0.000  0.000

but how to use it for all desired columns without writing almost the same code line by line?

I guess to start with: cols = te_matrix.filter(like='P1-').columns but how to apply it to the previous code?


Solution

  • Don't use apply, it's not needed here really inefficient.

    Go with boolean indexing:

    df.loc[df['P1-6'].gt(0), 'P1-6'] = df['family']
    

    Output:

            family         P1-6  P1-12  P1-22  P1-25  P1-26  P1-28  P1-88  P1-89  P1-90  P1-92  P1-93  P1-95
    0   Tourist|20   Tourist|20  0.587  0.838  0.675  0.771  0.952  0.746  0.678  0.603  0.728  0.680  0.759
    1  Stowaway|36  Stowaway|36  0.809  0.796  0.699  1.000  0.865  0.814  0.888  0.705  0.758  0.759  0.866
    2   Stowaway|4          0.0  0.000  0.000  0.000  0.000  0.000  0.880  0.000  0.000  0.000  0.000  0.000
    

    If you want to update several columns at once, go with mask:

    cols = df.filter(like='P1-').columns
    
    df[cols] = df[cols].mask(df[cols].gt(0), df['family'], axis=0)
    

    Or, if you're unsure if the input columns only contain numbers, with help of pd.to_numeric:

    cols = df.filter(like='P1-').columns
    
    df[cols] = df[cols].mask(df[cols].apply(pd.to_numeric, errors='coerce').gt(0),
                             df['family'], axis=0)
    

    Output:

            family         P1-6        P1-12        P1-22        P1-25        P1-26        P1-28        P1-88        P1-89        P1-90        P1-92        P1-93        P1-95
    0   Tourist|20   Tourist|20   Tourist|20   Tourist|20   Tourist|20   Tourist|20   Tourist|20   Tourist|20   Tourist|20   Tourist|20   Tourist|20   Tourist|20   Tourist|20
    1  Stowaway|36  Stowaway|36  Stowaway|36  Stowaway|36  Stowaway|36  Stowaway|36  Stowaway|36  Stowaway|36  Stowaway|36  Stowaway|36  Stowaway|36  Stowaway|36  Stowaway|36
    2   Stowaway|4          0.0          0.0          0.0          0.0          0.0          0.0   Stowaway|4          0.0          0.0          0.0          0.0          0.0