pythondatetimetimezonevalueerrorstrptime

ValueError: time data 'Tue 28 Feb 2023 11:27:38 AM CET' does not match format '%a %d %b %Y %I:%M:%S %p %Z'


I got a strainge bug.

Got 2 the same servers. Both ubuntu 22.04 both running Python 3.10.6

First server I run my code all well:

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> date_time_str = 'Tue 28 Feb 2023 11:27:38 AM CET'
>>> date_time_obj = datetime.strptime(date_time_str, '%a %d %b %Y %I:%M:%S %p %Z')
>>> print ("The type of the date is now",  type(date_time_obj))
The type of the date is now <class 'datetime.datetime'>
>>> print ("The date is", date_time_obj)
The date is 2023-02-28 11:27:38
>>>

Second server I do the same:

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> date_time_str = 'Tue 28 Feb 2023 11:27:38 AM CET'
>>> date_time_obj = datetime.strptime(date_time_str, '%a %d %b %Y %I:%M:%S %p %Z')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.10/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/lib/python3.10/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data 'Tue 28 Feb 2023 11:27:38 AM CET' does not match format '%a %d %b %Y %I:%M:%S %p %Z'
>>>

What could be causing this issue? its cleary not down to the format as its correct.


Solution

  • The Python strptime/strftime documentation is a bit secretive about %Z: It does not parse arbitrary time zone abbreviations1. If you scroll down to the technical detail section, you can find:

    1. [...]
      %Z [...] strptime() only accepts certain values for %Z:
      • any value in time.tzname for your machine’s locale
      • the hard-coded values UTC and GMT

    The first point explains why your attempt works on some systems but not on others.


    How to parse reliably

    "CET" is an abbreviated tz name. Many of those are ambiguous, so parsers likely refuse to parse them2. A way around is to define which abbreviation maps to which IANA time zone name with dateutils parser:

    from datetime import datetime
    import dateutil # pip install python-dateutil
    
    tzmapping = {"CET": dateutil.tz.gettz("Europe/Berlin")}
    
    print(dateutil.parser.parse('Tue 28 Feb 2023 11:27:38 AM CET', tzinfos=tzmapping))
    
    2023-02-28 11:27:38+01:00
    

    If you want to have more control over the parsing process, you can implement something similar yourself, e.g.

    from datetime import datetime
    from zoneinfo import ZoneInfo # Python 3.9+ standard library
    
    tzmapping = {"CET": ZoneInfo("Europe/Berlin")}
    
    date_time_str = 'Tue 28 Feb 2023 11:27:38 AM CET'
    
    # separate datetime part and timezone part:
    dt, tz = date_time_str.rsplit(" ", maxsplit=1)
    
    # now parse datetime part and set timezone.
    date_time_obj = datetime.strptime(dt, '%a %d %b %Y %I:%M:%S %p').replace(tzinfo=tzmapping[tz])
    
    print(date_time_obj)
    # 2023-02-28 11:27:38+01:00
    
    print(repr(date_time_obj))
    # datetime.datetime(2023, 2, 28, 11, 27, 38, tzinfo=zoneinfo.ZoneInfo(key='Europe/Berlin'))
    

    1 In fact, %Z doesn't parse anything in a strict sense; it just makes the parser ignore strings like "GMT" or "UTC". The resulting datetime object will still be naive!

    2 Besides, CET specifies a UTC offset, not a time zone in a geographical sense. For instance "Europe/Berlin" and "Europe/Paris" both experience CET but are different time zones.