python-3.xpandasdatetimeindex

Create Pandas DateTimeIndex with 3 digits millisecond precision


I have a dataframe like this:

lat  lon  year    month  day   hour  minute  second  millisecond
0.0  0.0  2023.0  11.0   22.0  10.0  15.0    34.0    345.0
0.0  0.0  2023.0  11.0   22.0  10.0  23.0    53.0    0.0
...

I want to create a DateTimeIndex using the date/time columns keeping the millisecond precision, the time is in UTC.

What I did is to extract those columns and created an DateTimeIndex using to_datetime, the code is:

utc_df = df.iloc[:, 2:]
datetimeindex = pd.to_datetime(utc_df, utc=True)

the result is:

>>> datetimeindex
0        2023-11-22 10:15:34.345000+00:00
1        2023-11-22 10:23:53+00:00
...
Length: 23179, dtype: datetime64[ns, UTC]

The problem is with the millisecond precision. If the column millisecond contains a not-zero value, this is visualised with a microsecond precision, if it's zero it's omitted.

I tried adding unit="ms" to to_datetime, but the result is the same.

If I remove utc=True, the visualisation is what I would like:

>>> pd.to_datetime(utc_df)
0        2023-11-22 10:15:34.345
1        2023-11-22 10:23:53.000
...
Length: 23179, dtype: datetime64[ns]

but if I print out just one element:

pd.to_datetime(utc_df)[0]
Timestamp('2023-11-22 10:15:34.345000')

the microseconds are back.

I tried to modify the format in this way:

datetimeindex = datetimeindex.map(
    lambda x: x.isoformat(timespec="milliseconds")
)

but this changes also the type of the elements into string and I want Timestamp.

Is there a way to have the millisecond with just three digits keeping the Timestamp type?

IMPORTANT NOTE:

As this is an exercise, the only libraries I can use are pandas and numpy, I cannot import anything else.


Solution

  • Since you don't have a timezone information, why use utc=True?

    Just go with:

    pd.to_datetime(df.iloc[:, 2:])
    

    Or

    pd.to_datetime(df.drop(columns=['lat', 'lon']))
    

    If you want to convert from a timezone aware timestamp to a non-timezone aware one:

    pd.to_datetime(df.drop(columns=['lat', 'lon']), utc=True).dt.tz_localize(None)
    

    Output:

    0   2023-11-22 10:15:34.345
    1   2023-11-22 10:23:53.000
    dtype: datetime64[ns]