pythonpandasdataframe

How to get the largest streak of negative numbers by sum?


This is my DataFrame:

import pandas as pd

df = pd.DataFrame(
    {
        'a': [-3, -1, -2, -5, 10, -3, -13, -3, -2, 1, -200, -100],
    }
)

Expected output:

  a
10 -200
11 -100

Logic:

I want to return the largest streak of negative numbers in terms of sum of them. In other words I want AT LEAST two consecutive negative rows and then I want the sum of it to be the largest in terms of absolute value:

My attempt based on this answer:

s = np.sign(df['a'])
g = s.ne(s.shift()).cumsum()
out = df[g.eq(df[s.eq(-1)].groupby(g).sum().idxmax())]

It gives me an empty dataframe


Solution

  • Transform instead of aggregating, and then you can use that to filter the lowest sum.

    s = np.sign(df['a'])
    g = s.ne(s.shift()).cumsum()
    m = df.groupby(g).transform(sum)['a']
    l = g.groupby(g).transform(len)
    out = df[m[l > 1].min() == m]
    
          a
    10 -200
    11 -100