pythonpandasindexingconditional-statementsiterator

Update row values where certain condition is met in pandas


Say I have the following dataframe:

table

What is the most efficient way to update the values of the columns feat and another_feat where the stream is number 2?

Is this it?

for index, row in df.iterrows():
    if df1.loc[index,'stream'] == 2:
       # do something

How do I do it if there are more than 100 columns? I don't want to explicitly name the columns that I want to update. I want to divide the value of each column by 2 (except for the stream column).

So to be clear, my goal is:

Dividing all values by 2 of all rows that have stream 2, but not changing the stream column.


Solution

  • I think you can use loc if you need update two columns to same value:

    df1.loc[df1['stream'] == 2, ['feat','another_feat']] = 'aaaa'
    print df1
       stream        feat another_feat
    a       1  some_value   some_value
    b       2        aaaa         aaaa
    c       2        aaaa         aaaa
    d       3  some_value   some_value
    

    If you need update separate, one option is use:

    df1.loc[df1['stream'] == 2, 'feat'] = 10
    print df1
       stream        feat another_feat
    a       1  some_value   some_value
    b       2          10   some_value
    c       2          10   some_value
    d       3  some_value   some_value
    

    Another common option is use numpy.where:

    df1['feat'] = np.where(df1['stream'] == 2, 10,20)
    print df1
       stream  feat another_feat
    a       1    20   some_value
    b       2    10   some_value
    c       2    10   some_value
    d       3    20   some_value
    

    EDIT: If you need divide all columns without stream where condition is True, use:

    print df1
       stream  feat  another_feat
    a       1     4             5
    b       2     4             5
    c       2     2             9
    d       3     1             7
    
    #filter columns all without stream
    cols = [col for col in df1.columns if col != 'stream']
    print cols
    ['feat', 'another_feat']
    
    df1.loc[df1['stream'] == 2, cols ] = df1 / 2
    print df1
       stream  feat  another_feat
    a       1   4.0           5.0
    b       2   2.0           2.5
    c       2   1.0           4.5
    d       3   1.0           7.0
    

    If working with multiple conditions is possible use multiple numpy.where or numpy.select:

    df0 = pd.DataFrame({'Col':[5,0,-6]})
    
    df0['New Col1'] = np.where((df0['Col'] > 0), 'Increasing', 
                              np.where((df0['Col'] < 0), 'Decreasing', 'No Change'))
    
    df0['New Col2'] = np.select([df0['Col'] > 0, df0['Col'] < 0],
                                ['Increasing',  'Decreasing'], 
                                default='No Change')
    
    print (df0)
       Col    New Col1    New Col2
    0    5  Increasing  Increasing
    1    0   No Change   No Change
    2   -6  Decreasing  Decreasing