pythonpandasgroup-by

.head() and .tail() with negative indexes on pandas GroupBy object


I'm having trouble with filtering all but the last 1 element in each group of groupby object of pandas.DataFrame:

x = pd.DataFrame([['a', 1], ['b', 1], ['a', 2], ['b', 2], ['a', 3], ['b', 3]], 
                 columns=['A', 'B'])
g = x.groupby('A')

As expected (according to documentation) g.head(1) returns

   A  B
0  a  1
1  b  1

whereas g.head(-1) returns empty DataFrame

From the behavior of x.head(-1) I'd expect it to return

   A  B
0  a  1
1  b  1
2  a  2
3  b  2

i.e. dropping the last element of each group and then merging it back into the dataframe. If that's just the bug in pandas, I'd be grateful to anyone who suggests an alternative approach.


Solution

  • As commented these haven't (yet) been implemented in pandas. However, you can use cumcount to implement them efficiently:

    def negative_head(g, n):
        return g._selected_obj[g.cumcount(ascending=False) >= n]
    
    def negative_tail(g, n):
        return g._selected_obj[g.cumcount() >= n]
    
    In [11]: negative_head(g, 1)  # instead of g.head(-1)
    Out[11]:
       B
    0  1
    1  1
    2  2
    3  2