pythonpandas

Applying group by then condition


How can the value of 'same_month=yes' be updated for entries grouped by "code" when the "Date" is identical?

below are examples of the data frame

import pandas as pd

df = pd.DataFrame([[1,'2023-01-02 00:00:00','01','2023',''],
                   [2,'2023-01-02 00:00:00','01','2023',''],
                   [3,'01/16/2023','01','2023',''],
                   [4,'01/17/2023','01','2023',''],
                   [5,'01/16/2023','01','2023',''],
                   [6,'02/13/2023','02','2023',''],
                   [7,'02/13/2023','02','2023',''],
                   [8,'02/13/2023','02','2023',''],
                   [9,'26/11/2018','11','2018',''],
                   [10,'26/11/2018','11','2018',''],
                   [11,'26/11/2018','11','2018',''],
                   [12,'26/11/2018','11','2018',''],
                   [13,'05/11/2018','11','2018',''],
                   [14,'09/11/2018','11','2018',''],

                  ],
                  columns=['Number','Date','Code','year','Same_Month'])

df 

so the expected result is that all of them are yes except entries number (4,13,14)


Solution

  • This should do the trick?

    df['Date'] = pd.to_datetime(df['Date'],format='mixed') #Ignore this line if already in datetime format
    duplicates = df.duplicated(['Code','Date'], keep=False) #Find duplicate date values per code group
    df.loc[duplicates, 'Same_Month'] = 'Yes' #Add yes where duplicates value is true
    

    This produces a table like this when using the data provided:

    Date Code Year Same_Month
    2023-01-02 01 2023 Yes
    2023-01-02 01 2023 Yes
    2023-01-16 01 2023 Yes
    2023-01-17 01 2023
    2023-01-16 01 2023 Yes
    2023-02-13 02 2023 Yes
    2023-02-13 02 2023 Yes
    2023-02-13 02 2023 Yes
    2018-11-26 11 2018 Yes
    2018-11-26 11 2018 Yes
    2018-11-26 11 2018 Yes
    2018-11-26 11 2018 Yes
    2018-05-11 11 2018
    2018-09-11 11 2018