pythondatetimetimezonepython-polars

Polars replace_time_zone function throws error of "non-existent in time zone"


here's our test data to work with:

import polars as pl
import pandas as pd
from datetime import date, time, datetime

df = pl.DataFrame(
    pl.datetime_range(
        start=date(2022, 1, 3),
        end=date(2022, 9, 30),
        interval="5m",
        time_unit="ns",
        time_zone="UTC",
        eager=True
    ).alias("UTC")
)

I specifically need replace_time_zone to actually change the underlying timestamp.

It works with convert_time_zone:

df.select(
    pl.col("UTC").dt.convert_time_zone(time_zone="America/New_York").alias("US")
)
shape: (77_761, 1)
┌────────────────────────────────┐
│ US                             │
│ ---                            │
│ datetime[ns, America/New_York] │
╞════════════════════════════════╡
│ 2022-01-02 19:00:00 EST        │
│ 2022-01-02 19:05:00 EST        │
│ 2022-01-02 19:10:00 EST        │
│ 2022-01-02 19:15:00 EST        │
│ 2022-01-02 19:20:00 EST        │
│ …                              │
│ 2022-09-29 19:40:00 EDT        │
│ 2022-09-29 19:45:00 EDT        │
│ 2022-09-29 19:50:00 EDT        │
│ 2022-09-29 19:55:00 EDT        │
│ 2022-09-29 20:00:00 EDT        │
└────────────────────────────────┘

But fails with replace_time_zone:

df.select(
   pl.col("UTC").dt.replace_time_zone(time_zone="America/New_York").alias("US")
)
# ComputeError: datetime '2022-03-13 02:00:00' is non-existent in time zone 'America/New_York'. 
# You may be able to use `non_existent='null'` to return `null` in this case.

Solution

  • You cannot replace the timezone in a UTC time series with a timezone that has DST transitions - you'll end up with non-existing and/or missing datetimes. The error could be a bit more informative, but I do not think this is specific to polars.

    Here's an illustration. "America/New_York" had a DST transition on Mar 13. 2 am did not exist on that day... so this works fine:

    import polars as pl
    from datetime import date
    
    df = pl.DataFrame(
        pl.datetime_range(
            start=date(2022, 3, 11),
            end=date(2022, 3, 13),
            interval="5m",
            time_unit="ns",
            time_zone="UTC",
            eager=True,
        ).alias("UTC")
    )
    
    print(
        df.select(
           pl.col("UTC").dt.replace_time_zone(time_zone="America/New_York").alias("US")
        )
    )
    # shape: (289, 1)
    # ┌────────────────────────────────┐
    # │ US                             │
    # │ ---                            │
    # │ datetime[ns, America/New_York] │
    # ╞════════════════════════════════╡
    # │ 2022-03-11 00:00:00 EST        │
    # │ 2022-03-11 00:05:00 EST        │
    # │ 2022-03-11 00:10:00 EST        │
    # │ 2022-03-11 00:15:00 EST        │
    # │ …                              │
    

    while this doesn't:

    df = pl.DataFrame(
        pl.datetime_range(
            start=date(2022, 3, 13),
            end=date(2022, 3, 15),
            interval="5m",
            time_unit="ns",
            time_zone="UTC",
            eager=True,
        ).alias("UTC")
    )
    
    print(
        df.select(
           pl.col("UTC").dt.replace_time_zone(time_zone="America/New_York").alias("US")
        )
    )
    # ComputeError: datetime '2022-03-13 02:00:00' is non-existent in time zone
    

    Workaround you could use is to convert UTC to the desired timezone, then add its UTC offset. Ex:

    df = pl.DataFrame(
        pl.datetime_range(
            start=date(2022, 1, 3),
            end=date(2022, 9, 30),
            interval="5m",
            time_unit="ns",
            time_zone="UTC",
            eager=True,
        ).alias("UTC")
    )
    
    df = df.with_columns(
           pl.col("UTC").dt.convert_time_zone(time_zone="America/New_York").alias("US")
    )
    
    df = df.with_columns(
        (pl.col("US")+(pl.col("UTC")-pl.col("US").dt.replace_time_zone(time_zone="UTC")))
        .alias("US_fakeUTC")
        )
    
    print(df.select(pl.col("US_fakeUTC")))
    # shape: (77_761, 1)
    # ┌────────────────────────────────┐
    # │ US_fakeUTC                     │
    # │ ---                            │
    # │ datetime[ns, America/New_York] │
    # ╞════════════════════════════════╡
    # │ 2022-01-03 00:00:00 EST        │
    # │ 2022-01-03 00:05:00 EST        │
    # │ 2022-01-03 00:10:00 EST        │
    # │ 2022-01-03 00:15:00 EST        │
    # │ …                              │