pythonpandasdataframetimezone

Why do I get NonExistentTimeError in python for time stamps between 12 to 1am on 2017-03-12


I get this error when trying to append two Pandas DFs together in a for loop:

Aggdata=Aggdata.append(Newdata)

This is the full error:

File "pandas\tslib.pyx", line 4096, in pandas.tslib.tz_localize_to_utc (pandas
\tslib.c:69713)
pytz.exceptions.NonExistentTimeError: 2017-03-12 02:01:24

However, in my files, I do not have such a time stamp, but I do have ones like 03/12/17 00:45:26 or 03/12/17 00:01:24. Where it is 2 hours before daylight savings. And if I manually delete the offending row, I get that same error for the next row with times between 12 and 1am on the 12th of March.

My original date/time column has no TZ info, but I calculate another column in EST, before the concatenation and localize it to EST, with time with TZ information:

`data['EST_DateTimeStamp']=pd.DatetimeIndex(pd.to_datetime(da‌​ta['myDate'])).tz_lo‌​calize('US/Eastern')‌​.tz_convert('US/East‌​ern')`

Doing some research here, I understand that 2 to 3am on the 12th should be having such error, but why midnight to 1am. So am I localizing it incorrectly? and then why is the error on the append line, and not the localization line?

I was able to reproduce this behavior in a very simple MCVE, saved here: https://codeshare.io/GLjrLe

It absolutely boggles my mind that the error is raised on the third append, and only if the next 3 appends follow. In others words, if I comment out the last 3 copies of appends, it works fine.. can't imagine what is happening.

Thank you for reading.


Solution

  • In case someone else may still find this helpful:

    Talking about it with @hashcode55, the solution was to upgrade Pandas on my server, as this was likely a bug in my previous version of that module.