pythonpandas

Pandas groupby make all elements 0 if first element is 1


I have the following df:

| day      | first mover    |
| -------- | -------------- |
| 1        |     1        |
| 2        |     1        |
| 3        |     0        |
| 4        |     0        |
| 5        |     0        |
| 6        |     1        |
| 7        |     0        |
| 8        |     1        |

i want to group this Data frame in the order bottom to top with a frequency of 4 rows. Furthermore if first row of group is 1 make all other entries 0. Desired output:

| day      | first mover    |
| -------- | -------------- |
| 1        |     1        |
| 2        |     0        |
| 3        |     0        |
| 4        |     0        |
| 5        |     0        |
| 6        |     0        |
| 7        |     0        |
| 8        |     0        |

The first half i have accomplished. I am confuse about how to make other entries 0 if first entry in each group is 1.

N=4
(df.iloc[::-1].groupby(np.arange(len(df))//N

Solution

  • I would use for-loop for this

    for name, group in df.groupby(...):
    

    this way I could use if/else to run or skip some code.

    To get first element in group:
    (I don't know why but .first() doesn't work as I expected - it asks for some offset)

    first_value = group.iloc[0]['first mover']
    

    To get indexes of other rows (except first):

    group.index[1:]
    

    and use them to set 0 in original df

    df.loc[group.index[1:], 'first mover'] = 0
    

    Minimal working code which I used for tests:

    import pandas as pd
    
    df = pd.DataFrame({
             'day': [1,2,3,4,5,6,7,8,], 
             'first mover': [1,1,0,0,0,1,0,1]
         })
         
    N = 4
    
    for name, group in df.groupby(by=lambda index:index//N):
        #print(f'\n---- group {name} ---\n')
        #print(group)
    
        first_value = group.iloc[0]['first mover']
        #print('first value:', first_value)
        
        if first_value == 1 :
            #print('>>> change:', group.index[1:])
            df.loc[group.index[1:], 'first mover'] = 0
            
    print('\n--- df ---\n')        
    print(df)