I have some function which returns list of holidays. The list looks so
['30 May 2024','1 May 2024', '29 Aug 2024', '14 Aug 2024', '19 May 2024']
When iam trying to do
print(pd.to_datetime(['30 May 2024','1 May 2024', '29 Aug 2024', '14 Aug 2024', '19 May 2024']))
Error
============================================================================================== RESTART: C:\Users\Bhargav\Downloads\gapi.py =============================================================================================
Traceback (most recent call last):
File "C:\Users\Bhargav\Downloads\gapi.py", line 2, in <module>
print(pd.to_datetime(['30 May 2024','1 May 2024', '29 Aug 2024', '14 Aug 2024', '19 May 2024']))
File "C:\Users\Bhargav\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\tools\datetimes.py", line 1099, in to_datetime
result = convert_listlike(argc, format)
File "C:\Users\Bhargav\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\tools\datetimes.py", line 433, in _convert_listlike_datetimes
return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
File "C:\Users\Bhargav\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\tools\datetimes.py", line 467, in _array_strptime_with_fallback
result, tz_out = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc)
File "strptime.pyx", line 501, in pandas._libs.tslibs.strptime.array_strptime
File "strptime.pyx", line 451, in pandas._libs.tslibs.strptime.array_strptime
File "strptime.pyx", line 583, in pandas._libs.tslibs.strptime._parse_with_format
ValueError: time data "29 Aug 2024" doesn't match format "%d %B %Y", at position 2. You might want to try:
- passing `format` if your strings have a consistent format;
- passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.
But But it works perfect with this list
print(pd.to_datetime(['30 Dec 2024','1 May 2024', '29 Aug 2024', '14 Aug 2024', '19 May 2024']))
I get
============================================================================================== RESTART: C:\Users\Bhargav\Downloads\gapi.py =============================================================================================
DatetimeIndex(['2024-12-30', '2024-05-01', '2024-08-29', '2024-08-14',
'2024-05-19'],
dtype='datetime64[ns]', freq=None)
Am i missing anything here??
The issue is due to having May
as a first date. May is both the full (%B
) and abbreviated (%b
) name of the month.
When pandas tried to infer the format, it decided to use %d %B %Y
, which is correct for May
but not for Aug
(it would have needed %b
). As described here this is done by pandas.tseries.api.guess_datetime_format
:
from pandas.tseries.api import guess_datetime_format
guess_datetime_format('30 May 2024') # '%d %B %Y'
When passing Dec/Aug as a first date, this doesn't cause the issue:
guess_datetime_format('30 Dec 2024') # '%d %b %Y'
If you use format='mixed'
or format='%d %b %Y'
, this will work fine:
pd.to_datetime(['30 May 2024','1 May 2024', '29 Aug 2024', '14 Aug 2024', '19 May 2024'], format='mixed')
DatetimeIndex(['2024-05-30', '2024-05-01', '2024-08-29', '2024-08-14',
'2024-05-19'],
dtype='datetime64[ns]', freq=None)