pythonpandasfiltergroup-by

Pandas - group by and filter


Here is my dataframe

my_df = pd.DataFrame({'col_1': ['A', 'A', 'B', 'B', 'C', 'C'],
                           'col_2': [1, 2, 1, 2, 1, 2]})

I would like to group by col_1 and filter out anything strictly greater than one using col_2. The final result should look like:

final_df = pd.DataFrame({'col_1': ['A',  'B',  'C'],
                               'col_2': [1,  1,  1, ]})

Here is what I tried:

df_ts = my_df.groupby('col_1').filter(lambda x: (x['col_2'] <= 1).any())

It returns the same dataframe

I also tried:

df_ts = my_df.groupby('col_1').filter(lambda x: x['col_2'] <= 1)

It generates error.


Solution

  • groupby.filter filters a full group based on its members.

    What you want is simply to filter rows. You do not need groupby:

    out = my_df[my_df['col_2'].le(1)]
    

    Output:

      col_1  col_2
    0     A      1
    2     B      1
    4     C      1