pythondatetimetimezonestrftimepython-polars

Converting timezone aware datetime column to string with UTC time offset


Update: This was fixed by pull/6673


I have the following dataframe:

df = (
    pl.DataFrame(
        {
            "int": [1, 2, 3],
            "date": ["2010-01-31T23:00:00+00:00","2010-02-01T00:00:00+00:00","2010-02-01T01:00:00+00:00"]
        }
    )
    .with_columns(
        pl.col("date").str.to_datetime()
        .dt.convert_time_zone("Europe/Amsterdam")
    )
)

which gives:

┌─────┬────────────────────────────────┐
│ int ┆ date                           │
│ --- ┆ ---                            │
│ i64 ┆ datetime[μs, Europe/Amsterdam] │
╞═════╪════════════════════════════════╡
│ 1   ┆ 2010-02-01 00:00:00 CET        │
│ 2   ┆ 2010-02-01 01:00:00 CET        │
│ 3   ┆ 2010-02-01 02:00:00 CET        │
└─────┴────────────────────────────────┘

I would like to convert this datetime type to a string with a time zone designator, e.g. 2010-02-01 00:00:00+01:00

I tried the following:

df.with_columns(pl.col("date").dt.to_string("%Y-%m-%d %H:%M:%S%z"))

which gives the following error:

pyo3_runtime.PanicException: a formatting trait implementation returned an error: Error

My desired output is stated below, which is what you get when you convert a datetime column to a string type in pandas with the "%Y-%m-%d %H:%M:%S%z" as the format:

┌─────┬──────────────────────────┐
│ int ┆ date                     │
│ --- ┆ ---                      │
│ i64 ┆ str                      │
╞═════╪══════════════════════════╡
│ 1   ┆ 2010-02-01 00:00:00+0100 │
│ 2   ┆ 2010-02-01 01:00:00+0100 │
│ 3   ┆ 2010-02-01 02:00:00+0100 │
└─────┴──────────────────────────┘

Is there any way to realize this result? Leaving out the %z at the end when specifying the format works but the UTC time offset is something I need.


Solution

  • py-polars v0.16.3 fixes the issue:

    import polars as pl
    
    df = (
        pl.DataFrame(
            {
                "int": [1, 2, 3],
                "date": ["2010-01-31T23:00:00+00:00","2010-02-01T00:00:00+00:00","2010-02-01T01:00:00+00:00"]
            }
        )
        .with_columns(
            pl.col("date").str.to_datetime()
            .dt.convert_time_zone("Europe/Amsterdam")
        )
    )
    
    print(
          df.with_columns(pl.col("date").dt.to_string("%Y-%m-%d %H:%M:%S%z"))
    )
    
    shape: (3, 2)
    ┌─────┬──────────────────────────┐
    │ int ┆ date                     │
    │ --- ┆ ---                      │
    │ i64 ┆ str                      │
    ╞═════╪══════════════════════════╡
    │ 1   ┆ 2010-02-01 00:00:00+0100 │
    │ 2   ┆ 2010-02-01 01:00:00+0100 │
    │ 3   ┆ 2010-02-01 02:00:00+0100 │
    └─────┴──────────────────────────┘
    

    Notes

    1. to get a colon-separated UTC offset, use %:z. See also Rust / chrono formatting directives.
    2. convert_time_zone is the new with_time_zone. I hope it stays that way ;-)