pythonpandasdataframedate

Compare Columns in DF & drop rows if Dates are matching between Columns


I have df as below. If Column Buy Exit Date is having Date (YYYY/MM/DD) matching with Column Buy Date having Date (YYYY/MM/DD) in next row, then drop the Buy Date row from its matching condition.

import pandas as pd
data = {
    'Buy Date': ['2022/11/07 15:00:00', '2022/11/11 12:00:00', '2022/11/24 15:00:00', '2022/12/01 12:00:00', '2022/12/14 09:00:00'],
    'Buy Exit Date': ['2022/11/11 09:00:00', '2022/11/24 12:00:00', '2022/12/01 09:00:00', '2022/12/06 09:00:00', '2022/12/16 09:00:00'],
    'ln_entry': [3232.899902, 3315.000000, 3381.699951, 3476.149902, 3364.699951]
}
df = pd.DataFrame(data)

df:

      Buy Date          Buy Exit Date       ln_entry
0   2022/11/07 15:00:00 2022/11/11 09:00:00 3232.899902
1   2022/11/11 12:00:00 2022/11/24 12:00:00 3315.000000
2   2022/11/24 15:00:00 2022/12/01 09:00:00 3381.699951
3   2022/12/01 12:00:00 2022/12/06 09:00:00 3476.149902
4   2022/12/14 09:00:00 2022/12/16 09:00:00 3364.699951

Expected Output:

    Buy Date            Buy Exit Date       ln_entry
0   2022/11/07 15:00:00 2022/11/11 09:00:00 3232.899902
1   2022/12/14 09:00:00 2022/12/16 09:00:00 3364.699951

My Code:

df['Buy Date'] = pd.to_datetime(df['Buy Date'])
df['Buy Exit Date'] = pd.to_datetime(df['Buy Exit Date'])
df['Buy Date Only'] = df['Buy Date'].dt.date
df['Buy Exit Date Only'] = df['Buy Exit Date'].dt.date
df['Next_Buy_Date_Only'] = df['Buy Date Only'].shift(-1)
to_remove = df['Buy Exit Date Only'] == df['Next_Buy_Date_Only']
df_cleaned = df[~to_remove]
df_cleaned = df_cleaned.drop(columns=['Buy Date Only', 'Buy Exit Date Only', 'Next_Buy_Date_Only'])
print(df_cleaned)

My Output:

  Buy Date            Buy Exit Date        ln_entry
3 2022-12-01 12:00:00 2022-12-06 09:00:00  3476.149902
4 2022-12-14 09:00:00 2022-12-16 09:00:00  3364.699951

Solution

  • With your current code you are shifting 'Buy Date' one row back, then checking for equality with 'Buy Exit Date' and keeping rows where condition is False. I.e.:

    df[['Buy Exit Date Only', 'Next_Buy_Date_Only']]
    
      Buy Exit Date Only Next_Buy_Date_Only
    0         2022-11-11         2022-11-11 # equal: delete (via `~`)
    1         2022-11-24         2022-11-24 # equal: delete
    2         2022-12-01         2022-12-01 # equal: delete
    3         2022-12-06         2022-12-14 # unequal: keep
    4         2022-12-16               None # unequal: keep
    

    But this is not the requirement. You want to delete rows where the 'Buy Exit Date' from the previous row equals 'Buy Date'. Try:

    df['Buy Date'] = pd.to_datetime(df['Buy Date'])
    df['Buy Exit Date'] = pd.to_datetime(df['Buy Exit Date'])
    out = df[df['Buy Date'].dt.date != df['Buy Exit Date'].dt.date.shift(1)]
    
    out
    

    Output:

                 Buy Date       Buy Exit Date     ln_entry
    0 2022-11-07 15:00:00 2022-11-11 09:00:00  3232.899902
    4 2022-12-14 09:00:00 2022-12-16 09:00:00  3364.699951
    

    The comparison being:

         Buy Date Buy Exit Date
    0  2022-11-07          None # unequal: keep
    1  2022-11-11    2022-11-11 # equal: delete
    2  2022-11-24    2022-11-24 # equal: delete
    3  2022-12-01    2022-12-01 # equal: delete
    4  2022-12-14    2022-12-06 # unequal: keep