I try reading a CSV file using pandas and get a warning I do not understand:
Lib\site-packages\dateutil\parser\_parser.py:1207: UnknownTimezoneWarning: tzname B identified but not understood. Pass `tzinfos` argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
warnings.warn("tzname {tzname} identified but not understood. "
I do nothing special, just pd.read_csv
with parse_dates=True
. I see no B
that looks like a timezone anywhere in my data. What does the warning mean?
A minimal reproducible example is the following:
import io
import pandas as pd
pd.read_csv(io.StringIO('x\n1A2B'), index_col=0, parse_dates=True)
Why does pandas think 1A2B
is a datetime?!
To solve this, I tried adding dtype={'x': str}
to force the column into a string. But I keep getting the warning regardless...
It turns out 1A2B
is being interpreted as "1 AM on day 2 of the current month, timezone B". By default, read_csv uses dateutil
to detect datetime values (date_parser=
):
import dateutil.parser
dateutil.parser.parse('1A2B')
Apart from the warning, this returns (today):
datetime.datetime(2023, 1, 2, 1, 0)
And B is not a valid timezone specifier indeed.
Why adding dtype
doesn't help stays to be investigated.
I did find a simple hack that works:
import dateutil.parser
def dateparse(self, timestr, default=None, ignoretz=False, tzinfos=None, **kwargs):
return self._parse(timestr, **kwargs)
dateutil.parser.parser.parse = dateparse # Monkey patch; hack!
This prevents using the current day/month/year as defaults, rendering the value invalid as a datetime
as expected.