pythonpandaspandas-groupby

How to find max value by group with conditions? Pandas


Product ID    Date      Sales
15475400    8/27/2019   23858
15475400    7/16/2019   21161
15475400    3/26/2018   31907
17104000    8/24/2019   39170
17104000    7/4/2019    29070
17104000    6/15/2019   41963
17104000    1/21/2019   38783
17169000    8/18/2018   58936
17169000    6/18/2018   47273
17169000    2/26/2018   28845
10842800    8/3/2019    41816
10842800    3/8/2019    41916
14901100    8/23/2019   37616

Greetings! I have above dataframe and I want to find the most recent records for each product before 8/1/2019.

I tried df.groupby('Product ID').timestamp.max() but don't know how to find products before 8/1/2019.

Expected outputs:

Product ID  Date        Sales
15475400    7/16/2019   21161
17104000    7/4/2019    29070
17169000    6/18/2018   47273
10842800    3/8/2019    41916
14901100    8/23/2019   37616

Thank you in advance for the help!


Solution

  • First we need to filter the df then just using drop_duplicates

    df['Date']=pd.to_datetime(df['Date'])
    s=df.loc[df.Date<'2019-08-01'].sort_values('Date').drop_duplicates('ProductID',keep='last')
    s
    Out[277]: 
       ProductID       Date  Sales
    6   17169000 2018-06-18  47273
    8   10842800 2019-03-08  41916
    3   17104000 2019-07-04  29070
    1   15475400 2019-07-16  21161
    

    Or we can do tail with groupby

    df.loc[df.Date<'2019-08-01'].sort_values('Date').groupby('ProductID').tail(1)
    

    Or idxmax

    df.loc[df.loc[df.Date<'2019-08-01'].groupby('ProductID').Date.idxmax()]