Python- Iterate over data frame in pandas and replace value that does not contains string from list

My DF looks like below

x  date_from             cleaned_date
1  21 JUNE 23.59          2024-06-23
2  18TH JUN 23:59        2024-06-18   
3  01TH JULY (23.59 HRS) 2024-07-01
4  28th June 2023        2023-06-28
5  5TH MAY 2023          2023-05-05
6  JUNE 27, 2023         2023-06-27

I wrote a code that extract the correct date from date_from column but somehow it add 'year' from nothing. There is no information about year in some cases but it still try to add it. I wrote a code that deals with it and it replace the whole value in cleaned_date column with None if there is no year information. It looks like below:

df.loc[(~df['date_from'].astype('str').str.contains('2025')) &
                (~df['date_from'].astype('str').str.contains('2024')) &
                 (df['date_from'].astype('str') != 'nan') & 
                 (~df['date_from'].astype('str').str.contains('2023')) &
                 (~df['date_from'].astype('str').str.contains('2022')) &
                 (~df['date_from'].astype('str').str.contains('2021')) &
                 (~df['date_from'].astype('str').str.contains('2020')) &
                 (~df['date_from'].astype('str').str.contains('2019')),  'cleaned_date'] = None

Unfortunately I have more years to check so is there a way to use for instance for loop? Do you have any idea?

Regards

Solution

You could replace your many conditions with a single regex:

df.loc[~df['date_from'].str.contains('nan|(?:2019|202[0-5])'),
       'cleaned_date'] = None

Output:

   x              date_from cleaned_date
0  1          21 JUNE 23.59         None
1  2         18TH JUN 23:59         None
2  3  01TH JULY (23.59 HRS)         None
3  4         28th June 2023   2023-06-28
4  5           5TH MAY 2023   2023-05-05
5  6          JUNE 27, 2023   2023-06-27

regex demo

nan                # match nan
|                  # or
(?:2019|202[0-5])  # match 2019 or 2020-2025

If you insist on the many conditions, this is doable with numpy.logical_or.reduce:

vals = ['nan', '2019', '2020', '2021', '2022', '2023', '2024', '2025']

cond = ~np.logical_or.reduce([df['date_from'].astype('str').str.contains(val)
                              for val in vals])

df.loc[cond, 'cleaned_date'] = None