python-2.7pandas

Pandas group by on one column with max date on another column python


i have a dataframe with following data :

invoice_no  dealer  billing_change_previous_month        date
       110       1                              0  2016-12-31
       100       1                         -41981  2017-01-30
      5505       2                              0  2017-01-30
      5635       2                          58730  2016-12-31

i want to have only one dealer with the maximum date . The desired output should be like this :

invoice_no  dealer  billing_change_previous_month        date
       100       1                         -41981  2017-01-30
      5505       2                              0  2017-01-30

each dealer should be distinct with maximum date, thanks in advance for your help.


Solution

  • You can use boolean indexing using groupby and transform

    df_new = df[df.groupby('dealer').date.transform('max') == df['date']]
    
        invoice_no  dealer  billing_change_previous_month   date
    1   100         1       -41981                          2017-01-30
    2   5505        2       0                               2017-01-30
    

    The solution works as expected even if there are more than two dealers (to address question posted by Ben Smith),

    df = pd.DataFrame({'invoice_no':[110,100,5505,5635,10000,10001], 'dealer':[1,1,2,2,3,3],'billing_change_previous_month':[0,-41981,0,58730,9000,100], 'date':['2016-12-31','2017-01-30','2017-01-30','2016-12-31', '2019-12-31', '2020-01-31']})
    
    df['date'] = pd.to_datetime(df['date'])
    df[df.groupby('dealer').date.transform('max') == df['date']]
    
    
        invoice_no  dealer  billing_change_previous_month   date
    1   100         1       -41981                          2017-01-30
    2   5505        2       0                               2017-01-30
    5   10001       3       100                             2020-01-31