pythonpandasrowfind-occurrences

Get maximum occurance of one specific value per row with pandas


I have the following dataframe:

   1   2   3   4   5   6   7  8  9
0  0   0   1   0   0   0   0  0  1
1  0   0   0   0   1   1   0  1  0
2  1   1   0   1   1   0   0  1  1
...

I want to get for each row the longest sequence of value 0 in the row. so, the expected results for this dataframe will be an array that looks like this:

[5,4,2,...]

as on the first row, maximum sequenc eof value 0 is 5, ect.

I have seen this post and tried for the beginning to get this for the first row (though I would like to do this at once for the whole dataframe) but I got errors:

s=df_day.iloc[0]
(~s).cumsum()[s].value_counts().max()

TypeError: ufunc 'invert' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

when I inserted manually the values like this:

s=pd.Series([0,0,1,0,0,0,0,0,1])
(~s).cumsum()[s].value_counts().max()

>>>7

I got 7 which is number of total 0 in the row but not the max sequence. However, I don't understand why it raises the error at first, and , more important, I would like to run it on the end on the while dataframe and per row.

My end goal: get the maximum uninterrupted occurance of value 0 in a row.


Solution

  • The following code should do the job.

    the function longest_streak will count the number of consecutive zeros and return the max, and you can use apply on your df.

    from itertools import groupby
        def longest_streak(l):
          lst = []
          for n,c in groupby(l):
            num,count = n,sum(1 for i in c)
            if num==0:
              lst.append((num,count))
    
      maxx = max([y for x,y in lst])
      return(maxx)
    
    df.apply(lambda x: longest_streak(x),axis=1)