Trying to convert a pl.Date column to UNIX epoch as is, without any timezone offset:
import datetime
import polars as pl
df = pl.DataFrame(
{'Date': [datetime.datetime.now().date()]}
)
Correct time (00:00:00) when converted to Datetime:
df.with_columns(
pl.col("Date").cast(pl.Datetime)
)
┌─────────────────────┐
│ Date │
│ --- │
│ datetime[μs] │
╞═════════════════════╡
│ 2023-06-10 00:00:00 │
└─────────────────────┘
Incorrect time when casting to timestamp:
datetime.datetime.fromtimestamp(
df.with_columns(
pl.col("Date").cast(pl.Datetime).dt.timestamp("ms").truediv(1_000)
).item()
)
datetime.datetime(2023, 6, 10, 8, 0) # (08:00:00)
As suggested, without casting to Datetime also produces the incorrect time. (08:00:00)
pl.col("Date").dt.timestamp("ms").truediv(1_000)
Note that vanilla Python datetime defaults to local time if you don't set a time zone (naive datetime). In contrast, polars assumes naive datetime to resemble UTC (as pandas does as well).
Keep it consistent by setting the time zone, e.g. UTC:
from datetime import datetime, timezone
import polars as pl
df = pl.DataFrame(
{'Date': [datetime.now(timezone.utc).date()]}
)
df = df.with_columns(
pl.col("Date").cast(pl.Datetime).dt.timestamp("ms").truediv(1_000).alias("Unix")
)
print(df)
# shape: (1, 2)
# ┌────────────┬──────────┐
# │ Date ┆ Unix │
# │ --- ┆ --- │
# │ date ┆ f64 │
# ╞════════════╪══════════╡
# │ 2023-06-10 ┆ 1.6864e9 │
# └────────────┴──────────┘
print(datetime.fromtimestamp(df["Unix"][0], timezone.utc))
# 2023-06-10 00:00:00+00:00