I'm trying to convert string timestamps to polars datetime from the timestamps my camera puts in it RAW file metadata, but polars throws this error when I have timestamps from both summer time and winter time.
ComputeError: Different timezones found during 'strptime' operation.
How do I persuade it to convert these successfully? (ideally handling different timezones as well as the change from summer to winter time)
And then how do I convert these timestamps back to the proper local clocktime for display?
Note that while the timestamp strings just show the offset, there is an exif field "Time Zone City" in the metadata as well as fields with just the local (naive) timestamp
import polars as plr
testdata=[
{'name': 'BST 11:06', 'ts': '2022:06:27 11:06:12.16+01:00'},
{'name': 'GMT 7:06', 'ts': '2022:12:27 12:06:12.16+00:00'},
]
pdf = plr.DataFrame(testdata)
pdfts = pdf.with_column(plr.col('ts').str.strptime(plr.Datetime, fmt = "%Y:%m:%d %H:%M:%S.%f%z"))
print(pdf)
print(pdfts)
It looks like I need to use tz_convert, but I cannot see how to add it to the conversion expression and what looks like the relevant docpage just 404's broken link to dt_namespace
Since PR 6496, was merged you can parse mixed offsets to UTC, then set the time zone:
import polars as pl
pdf = pl.DataFrame([
{'name': 'BST 11:06', 'ts': '2022:06:27 11:06:12.16+01:00'},
{'name': 'GMT 7:06', 'ts': '2022:12:27 12:06:12.16+00:00'},
])
pdfts = pdf.with_columns(
pl.col('ts').str.to_datetime("%Y:%m:%d %H:%M:%S%.f%z")
.dt.convert_time_zone("Europe/London")
)
print(pdfts)
shape: (2, 2)
┌───────────┬─────────────────────────────┐
│ name ┆ ts │
│ --- ┆ --- │
│ str ┆ datetime[μs, Europe/London] │
╞═══════════╪═════════════════════════════╡
│ BST 11:06 ┆ 2022-06-27 11:06:12.160 BST │
│ GMT 7:06 ┆ 2022-12-27 12:06:12.160 GMT │
└───────────┴─────────────────────────────┘
Here's a work-around you could use: remove the UTC offset and localize to a pre-defined time zone. Note: the result will only be correct if UTC offsets and time zone agree.
timezone = "Europe/London"
pdfts = pdf.with_column(
plr.col('ts')
.str.replace("[+|-][0-9]{2}:[0-9]{2}", "")
.str.strptime(plr.Datetime, fmt="%Y:%m:%d %H:%M:%S%.f")
.dt.tz_localize(timezone)
)
print(pdf)
┌───────────┬──────────────────────────────┐
│ name ┆ ts │
│ --- ┆ --- │
│ str ┆ str │
╞═══════════╪══════════════════════════════╡
│ BST 11:06 ┆ 2022:06:27 11:06:12.16+01:00 │
│ GMT 7:06 ┆ 2022:12:27 12:06:12.16+00:00 │
└───────────┴──────────────────────────────┘
print(pdfts)
┌───────────┬─────────────────────────────┐
│ name ┆ ts │
│ --- ┆ --- │
│ str ┆ datetime[ns, Europe/London] │
╞═══════════╪═════════════════════════════╡
│ BST 11:06 ┆ 2022-06-27 11:06:12.160 BST │
│ GMT 7:06 ┆ 2022-12-27 12:06:12.160 GMT │
└───────────┴─────────────────────────────┘
Side-Note: to be fair, pandas
does not handle mixed UTC offsets either, unless you parse to UTC straight away (keyword utc=True
in pd.to_datetime
). With mixed UTC offsets, it falls back to using series of native Python datetime objects. That makes a lot of the pandas time series functionality like the dt
accessor unavailable.