pythonpandastimezonedst

tz_localize AmbiguousTimeError: Cannot infer dst time with non DST dates


I'm trying to import some timeseries data and convert it into UTC so I can merge it with another dataset. This data seems to have 24 hour data and doesn't have DST adjustments. Python pandas tz_localize throws NonExistentTimeError, then unable to drop erroneous times gives a similar answer, but they simply drop the line. I need to shift it so I can merge it with my other data.

When I run my code:

df = pd.read_csv('http://rredc.nrel.gov/solar/old_data/nsrdb/1991-2010/data/hourly/{}/{}_{}_solar.csv'.format(723898,723898,1998), usecols=["YYYY-MM-DD", "HH:MM (LST)","Meas Glo (Wh/m^2)","Meas Dir (Wh/m^2)","Meas Dif (Wh/m^2)"])

def clean_time(obj):
    hour = int(obj[0:-3])
    hour = str(hour - 1)
    if len(str(hour)) == 2:
        return hour+":00"
    else:
        return "0" + hour + ":00"

df['HH:MM (LST)'] = df['HH:MM (LST)'].apply(clean_time)
df['DateTime'] = df['YYYY-MM-DD'] + " " + df['HH:MM (LST)']
df = df.set_index(pd.DatetimeIndex(df['DateTime']))
df.drop(["YYYY-MM-DD", "HH:MM (LST)",'DateTime'],axis=1,inplace=True)
df.index = df.index.tz_localize('US/Pacific', ambiguous='infer')

I get:

pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from 1998-10-25 01:00:00 as there are no repeated times

If I leave ambiguous='raise' (the default), it gives me:

pytz.exceptions.NonExistentTimeError: 1998-04-05 02:00:00

So I'm stuck on either the start, or end of daylight savings time.

There's quite a few of these datasets (multiple sites over multiple years) I need to merge, so I'd prefer not to hand code specific hours to shift.


Solution

  • Minimal reproduction scenario:

    from datetime import datetime, timedelta
    
    import pandas as pd
    
    
    df = pd.DataFrame([[datetime(2019, 10, 27, 0) + timedelta(hours=i), i] for i in range(24)], columns=['dt', 'i']).set_index('dt')
    df.index.tz_localize('Europe/Amsterdam', ambiguous='infer')
    

    pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from 2019-10-27 02:00:00 as there are no repeated times

    Solution: manually specify which datetime objects must be considered DT (Daylight Time) or DST (Daylight Savings Time). See documentation.

    from datetime import datetime, timedelta
    import numpy as np
    import pandas as pd
    
    
    df = pd.DataFrame([[datetime(2019, 10, 27, 0) + timedelta(hours=i), i] for i in range(24)], columns=['dt', 'i']).set_index('dt')
    infer_dst = np.array([False] * df.shape[0])  # all False -> every row considered DT, alternative is True to indicate DST. The array must correspond to the iloc of df.index
    df.index.tz_localize('Europe/Amsterdam', ambiguous=infer_dst)  # no error