pythondatetimetimezoneutciso

convert timestamp with offset to utc timestamp in python?


I have a dataframe with a column in it -

date_col
2024-05-12T17:46:50.746922-07:00
2024-05-12T17:31:35.438304-07:00
2024-05-12T17:46:50.642095-07:00
2024-05-12T17:02:02.299320-07:00

I tried below code -

df['updated'] = datetime.fromisoformat(str(df['date_col'])).astimezone(timezone.utc).isoformat(timespec="milliseconds")

But its giving error -

TypeError: fromisoformat: argument must be str
print(type(df['date_col'])) gives <class 'pandas.core.series.Series'>
print(df.dypes) gives date_col object

Expected output is in form of - 2024-05-13T00:46:50.746Z

Any help is appreciated.


Solution

  • I'd try something like this:

    import pandas as pd
    import pytz
    
    # Convert the column to datetime objects
    df['date_col'] = pd.to_datetime(df['date_col'], utc=True)
    
    # Convert to UTC and format as expected
    df['updated'] = df['date_col'].dt.tz_convert('UTC').dt.strftime('%Y-%m-%dT%H:%M:%S.%fZ')
    

    Some explanation:

    1. pd.to_datetime(df['date_col'], utc=True): This line converts the date_col column to datetime objects, assuming that the strings in the column represent UTC times.
    2. df['date_col'].dt.tz_convert('UTC'): This converts the datetime objects to UTC timezone. The dt accessor is used for datetime operations in pandas.
    3. df['date_col'].dt.tz_convert('UTC').dt.strftime('%Y-%m-%dT%H:%M:%S.%fZ'): This line converts the UTC datetime objects to the desired string format using the strftime method. The format string '%Y-%m-%dT%H:%M:%S.%fZ' represents the expected output format, where %f is used for microseconds and Z represents the UTC timezone.

    After running this code, the updated column in your DataFrame should contain the expected output format.

    !!! If the strings in your date_col are not already in UTC, you'll need to adjust the pd.to_datetime call to specify the correct timezone or format.