pandasdataframedatefiltering

Filter Multiple Items on Different Conditions using pandas


I am attempting to filter a dataframe index based on two conditions, ID and DATE, where DATE will vary by ID.

An example of what would be filtering on, I have provided 'filt'

Is there a better way to do this? I know I could use a for loop, but that does not seem very elegant.

Code to generate results:

import pandas as pd

DT = pd.date_range(start='2021-01-01', periods=10, freq='D')

ID = ['A'] * len(DT) + ['B'] * len(DT)

VAL = list(range(len(DT))) * 2

df = pd.DataFrame({'DT': DT.tolist() + DT.tolist(), 'ID': ID, 'VAL': VAL})
df.set_index(['DT', 'ID'], inplace=True)

filt = pd.Series([DT[2], DT[5]], index=['A', 'B'])

# WANT
DF_A = df.loc[(df.index.get_level_values('ID') == 'A') & (df.index.get_level_values('DT') >= DT[2])]
DF_B = df.loc[(df.index.get_level_values('ID') == 'B') & (df.index.get_level_values('DT') >= DT[5])]
WANT = pd.concat([DF_A, DF_B])

Example of what I have and what I am looking for:

# HAVE 
DATE,ID,VAL
2021-01-01,A,0
2021-01-02,A,1
2021-01-03,A,2
2021-01-04,A,3
2021-01-05,A,4
2021-01-06,A,5
2021-01-07,A,6
2021-01-08,A,7
2021-01-09,A,8
2021-01-10,A,9
2021-01-01,B,0
2021-01-02,B,1
2021-01-03,B,2
2021-01-04,B,3
2021-01-05,B,4
2021-01-06,B,5
2021-01-07,B,6
2021-01-08,B,7
2021-01-09,B,8
2021-01-10,B,9


# WANT 
DATE,ID,VAL
2021-01-03,A,2
2021-01-04,A,3
2021-01-05,A,4
2021-01-06,A,5
2021-01-07,A,6
2021-01-08,A,7
2021-01-09,A,8
2021-01-10,A,9
2021-01-06,B,5
2021-01-07,B,6
2021-01-08,B,7
2021-01-09,B,8
2021-01-10,B,9

Solution

  • Merge the cutover date as a column to the data frame first, and filter by it:

    (
        df.reset_index()
          .assign(cut_date = lambda x: x.ID.map(filt))[lambda x: x.DT >= x.cut_date]
          .drop('cut_date', axis=1)
    )
    
               DT ID  VAL
    2  2021-01-03  A    2
    3  2021-01-04  A    3
    4  2021-01-05  A    4
    5  2021-01-06  A    5
    6  2021-01-07  A    6
    7  2021-01-08  A    7
    8  2021-01-09  A    8
    9  2021-01-10  A    9
    15 2021-01-06  B    5
    16 2021-01-07  B    6
    17 2021-01-08  B    7
    18 2021-01-09  B    8
    19 2021-01-10  B    9
    

    Or as @wjandrea suggested, avoid the temporary column:

    df.reset_index()[lambda d: d['DT'] >= d['ID'].map(filt)]