I have a pandas.DataFrame df
with df.index
which yeilds something like this:
DatetimeIndex(['2014-10-06 00:55:11.357899904',
'2014-10-06 00:56:39.046799898',
'2014-10-06 00:56:39.057499886',
'2014-10-06 00:56:40.684299946',
'2014-10-06 00:56:41.115299940',
'2014-10-06 01:03:52.764300108',
'2014-10-06 01:21:18.448499918',
'2014-10-06 01:21:18.457200050',
'2014-10-06 01:21:18.584199905',
'2014-10-06 01:21:18.594700098',
...
'2014-11-05 00:25:47.996000051',
'2014-11-05 00:56:45.081799984',
'2014-11-05 00:56:45.096899986',
'2014-11-05 05:50:57.639699936',
'2014-11-05 06:08:56.365000010',
'2014-11-05 06:11:20.519099950',
'2014-11-05 06:15:03.470400095',
'2014-11-05 06:15:03.981600046',
'2014-11-05 06:25:31.514300108',
'2014-11-05 06:25:59.310400009'],
dtype='datetime64[ns]', name='time', length=1000, freq=None)
I am running a DAG on airflow, which stops at the following line df.loc[start_date:end_date]
, saying that:
AttributeError: 'Pendulum' object has no attribute 'nanosecond'
I cannot reproduce the error without running the code in Airflow. The same code runs just fine without Airflow.
The start_date
is the Airflow macro execution_date
and end_date
is the next_execution_date
.
I guess the issues is to do with the date-time dtype
of the df
not being compatable with the ones from the start_date
& end_date
, but I have no idea how to address it.
I tried removing time zones, changing the dtype
but nothing worked.
After some searching, I found the source of the problem and a solution.
the problem
The issue is caused by the two macros passed down from Airflow:
start_date
, which is the execution_date
macro
end_date
, which is the next_execution_date
macro
The types of them are pendulum.datetime
, and not datetime.datetime
, as the Airflow documentation says. This causes the clash with pandas.DataFrame
.
pandas
and pendulum
currently don't work well together and the problem is well described in this StackOverflow asnwer.
the solution
The solution seesm to convery the start_date
and end_date
from pendulum.datetime
to datetime.datetime
.
For this I created this simple function, which converts from to string beofore converting to datetime.datetime
. I am sure they are better ways to do it, but this was quite simple and safe, hence why I used it.
Here is the function itself:
def pendulum_to_datetime(pendulum_date):
"""
Convert pendulum to datetime format.
The conversion is done from pendulum -> string -> dateime.
Args:
pendulum_date (pendulum): The date you wish to convert.
Returns:
(datetime) The converted date.
"""
fmt = '%Y-%m-%dT%H:%M:%S%z'
string_date = pendulum_date.strftime(fmt)
return datetime.strptime(string_date, fmt)