How can the value of 'same_month=yes' be updated for entries grouped by "code" when the "Date" is identical?
below are examples of the data frame
import pandas as pd
df = pd.DataFrame([[1,'2023-01-02 00:00:00','01','2023',''],
[2,'2023-01-02 00:00:00','01','2023',''],
[3,'01/16/2023','01','2023',''],
[4,'01/17/2023','01','2023',''],
[5,'01/16/2023','01','2023',''],
[6,'02/13/2023','02','2023',''],
[7,'02/13/2023','02','2023',''],
[8,'02/13/2023','02','2023',''],
[9,'26/11/2018','11','2018',''],
[10,'26/11/2018','11','2018',''],
[11,'26/11/2018','11','2018',''],
[12,'26/11/2018','11','2018',''],
[13,'05/11/2018','11','2018',''],
[14,'09/11/2018','11','2018',''],
],
columns=['Number','Date','Code','year','Same_Month'])
df
so the expected result is that all of them are yes except entries number (4,13,14)
This should do the trick?
df['Date'] = pd.to_datetime(df['Date'],format='mixed') #Ignore this line if already in datetime format
duplicates = df.duplicated(['Code','Date'], keep=False) #Find duplicate date values per code group
df.loc[duplicates, 'Same_Month'] = 'Yes' #Add yes where duplicates value is true
This produces a table like this when using the data provided:
Date | Code | Year | Same_Month |
---|---|---|---|
2023-01-02 | 01 | 2023 | Yes |
2023-01-02 | 01 | 2023 | Yes |
2023-01-16 | 01 | 2023 | Yes |
2023-01-17 | 01 | 2023 | |
2023-01-16 | 01 | 2023 | Yes |
2023-02-13 | 02 | 2023 | Yes |
2023-02-13 | 02 | 2023 | Yes |
2023-02-13 | 02 | 2023 | Yes |
2018-11-26 | 11 | 2018 | Yes |
2018-11-26 | 11 | 2018 | Yes |
2018-11-26 | 11 | 2018 | Yes |
2018-11-26 | 11 | 2018 | Yes |
2018-05-11 | 11 | 2018 | |
2018-09-11 | 11 | 2018 |