pythonpandaspytzzoneinfo

Localize time zone based on column in pandas


I am trying to set timezone to a datetime column, based on another column containing the time zone.

Example data:

  DATETIME                VALUE          TIME_ZONE
0   2021-05-01 00:00:00    1.00      Europe/Athens
1   2021-05-01 00:00:00    2.13      Europe/London
2   2021-05-01 00:00:00    5.13      Europe/London
3   2021-05-01 01:00:00    4.25      Europe/Dublin
4   2021-05-01 01:00:00    4.25       Europe/Paris

I am trying to assign a time zone to the DATETIME column, but using the tz_localize method, I cannot avoid using an apply call, which will be very slow on my large dataset. Is there some way to do this without using apply?

What I have now (which is slow):

df['DATETIME_WITH_TZ'] = df.apply(lambda row: row['DATETIME'].tz_localize(row['TIME_ZONE']), axis=1)

Solution

  • I'm not sure but a listcomp seems to be x17 faster than apply in your case :

    df["DATETIME_WITH_TZ"] = [dt.tz_localize(tz)
                              for dt,tz in zip(df["DATETIME"], df["TIME_ZONE"])]
    

    Another variant, with tz_convert :

    df["DATETIME_WITH_TZ"] = [dt.tz_localize("UTC").tz_convert(tz)
                              for dt,tz in zip(df["DATETIME"], df["TIME_ZONE"])]
    

    Timing :

    #%%timeit #listcomp1
    47.4 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
    
    #%%timeit #listcomp2
    25.7 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
    
    #%%timeit #apply
    457 µs ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
    

    Output :

    print(df)
                 DATETIME  VALUE      TIME_ZONE           DATETIME_WITH_TZ
    0 2021-05-01 00:00:00   1.00  Europe/Athens  2021-05-01 03:00:00+03:00
    1 2021-05-01 00:00:00   2.13  Europe/London  2021-05-01 01:00:00+01:00
    2 2021-05-01 00:00:00   5.13  Europe/London  2021-05-01 01:00:00+01:00
    3 2021-05-01 01:00:00   4.25  Europe/Dublin  2021-05-01 02:00:00+01:00
    4 2021-05-01 01:00:00   4.25   Europe/Paris  2021-05-01 03:00:00+02:00