I have a dataframe like this:
lat lon year month day hour minute second millisecond
0.0 0.0 2023.0 11.0 22.0 10.0 15.0 34.0 345.0
0.0 0.0 2023.0 11.0 22.0 10.0 23.0 53.0 0.0
...
I want to create a DateTimeIndex
using the date/time columns keeping the millisecond precision, the time is in UTC.
What I did is to extract those columns and created an DateTimeIndex
using to_datetime
, the code is:
utc_df = df.iloc[:, 2:]
datetimeindex = pd.to_datetime(utc_df, utc=True)
the result is:
>>> datetimeindex
0 2023-11-22 10:15:34.345000+00:00
1 2023-11-22 10:23:53+00:00
...
Length: 23179, dtype: datetime64[ns, UTC]
The problem is with the millisecond precision.
If the column millisecond
contains a not-zero value, this is visualised with a microsecond precision, if it's zero it's omitted.
I tried adding unit="ms"
to to_datetime
, but the result is the same.
If I remove utc=True
, the visualisation is what I would like:
>>> pd.to_datetime(utc_df)
0 2023-11-22 10:15:34.345
1 2023-11-22 10:23:53.000
...
Length: 23179, dtype: datetime64[ns]
but if I print out just one element:
pd.to_datetime(utc_df)[0]
Timestamp('2023-11-22 10:15:34.345000')
the microseconds are back.
I tried to modify the format in this way:
datetimeindex = datetimeindex.map(
lambda x: x.isoformat(timespec="milliseconds")
)
but this changes also the type of the elements into string and I want Timestamp
.
Is there a way to have the millisecond with just three digits keeping the Timestamp
type?
IMPORTANT NOTE:
As this is an exercise, the only libraries I can use are pandas
and numpy
, I cannot import anything else.
Since you don't have a timezone information, why use utc=True
?
Just go with:
pd.to_datetime(df.iloc[:, 2:])
Or
pd.to_datetime(df.drop(columns=['lat', 'lon']))
If you want to convert from a timezone aware timestamp to a non-timezone aware one:
pd.to_datetime(df.drop(columns=['lat', 'lon']), utc=True).dt.tz_localize(None)
Output:
0 2023-11-22 10:15:34.345
1 2023-11-22 10:23:53.000
dtype: datetime64[ns]