pythonpandas

How to convert UNIX time to Datetime when using Pandas.DataFrame.from_dict?


I am reading from a json data file and loading it into a dictionary. It as key:value like below.

"1707195600000":1,"1707282000000":18,"1707368400000":1,"1707454800000":13,"1707714000000":18,"1707800400000":12,"1707886800000":155,"1707973200000":1"

Code Snippet:

with open('data.json', 'r') as json_file:
    data_pairs = json.load(json_file)

dataframe = pd.DataFrame.from_dict(data_pairs, orient='index')

Can it be done with Pandas.DataFrame.from_dict? Or I should convert it all the keys in the dictionary before using from_dict?


Solution

  • You can convert the keys first. They seem to be in milliseconds:

    import pandas as pd
    import datetime as dt
    
    data = {"1707195600000":1,"1707282000000":18,"1707368400000":1,"1707454800000":13,"1707714000000":18,"1707800400000":12,"1707886800000":155,"1707973200000":1}
    cvt_data = {dt.datetime.fromtimestamp(int(k) / 1000):v for k, v in data.items()}
    df = pd.DataFrame.from_dict(cvt_data, orient='index')
    print(df)
    

    Output:

                           0
    2024-02-05 21:00:00    1
    2024-02-06 21:00:00   18
    2024-02-07 21:00:00    1
    2024-02-08 21:00:00   13
    2024-02-11 21:00:00   18
    2024-02-12 21:00:00   12
    2024-02-13 21:00:00  155
    2024-02-14 21:00:00    1
    

    The above converts the timestamps to time zone-unaware local time which was Pacific Standard Time(UTC-8) in my case. Below converts to to time zone-aware datetimes in UTC. Note the +00:00 indicating the zone offset in the output:

    cvt_data = {dt.datetime.fromtimestamp(int(k) / 1000, tz=dt.UTC):v for k, v in data.items()}
    

    It is much faster to use pd.to_datetime (about 6-7x in my testing), and defaults to a tz-unaware UTC conversion for this data. Use utc=True to make it tz-aware:

    df = pd.DataFrame.from_dict(data, orient='index')
    df.index = pd.to_datetime(df.index.astype(dtype='int64'), utc=True, unit='ms')
    

    Output (both UTC versions):

                                 0
    2024-02-06 05:00:00+00:00    1
    2024-02-07 05:00:00+00:00   18
    2024-02-08 05:00:00+00:00    1
    2024-02-09 05:00:00+00:00   13
    2024-02-12 05:00:00+00:00   18
    2024-02-13 05:00:00+00:00   12
    2024-02-14 05:00:00+00:00  155
    2024-02-15 05:00:00+00:00    1