I am reading from a json data file and loading it into a dictionary. It as key:value like below.
"1707195600000":1,"1707282000000":18,"1707368400000":1,"1707454800000":13,"1707714000000":18,"1707800400000":12,"1707886800000":155,"1707973200000":1"
Code Snippet:
with open('data.json', 'r') as json_file:
data_pairs = json.load(json_file)
dataframe = pd.DataFrame.from_dict(data_pairs, orient='index')
Can it be done with Pandas.DataFrame.from_dict
? Or I should convert it all the keys in the dictionary before using from_dict
?
You can convert the keys first. They seem to be in milliseconds:
import pandas as pd
import datetime as dt
data = {"1707195600000":1,"1707282000000":18,"1707368400000":1,"1707454800000":13,"1707714000000":18,"1707800400000":12,"1707886800000":155,"1707973200000":1}
cvt_data = {dt.datetime.fromtimestamp(int(k) / 1000):v for k, v in data.items()}
df = pd.DataFrame.from_dict(cvt_data, orient='index')
print(df)
Output:
0
2024-02-05 21:00:00 1
2024-02-06 21:00:00 18
2024-02-07 21:00:00 1
2024-02-08 21:00:00 13
2024-02-11 21:00:00 18
2024-02-12 21:00:00 12
2024-02-13 21:00:00 155
2024-02-14 21:00:00 1
The above converts the timestamps to time zone-unaware local time which was Pacific Standard Time(UTC-8) in my case. Below converts to to time zone-aware datetimes in UTC. Note the +00:00
indicating the zone offset in the output:
cvt_data = {dt.datetime.fromtimestamp(int(k) / 1000, tz=dt.UTC):v for k, v in data.items()}
It is much faster to use pd.to_datetime
(about 6-7x in my testing), and defaults to a tz-unaware UTC conversion for this data. Use utc=True
to make it tz-aware:
df = pd.DataFrame.from_dict(data, orient='index')
df.index = pd.to_datetime(df.index.astype(dtype='int64'), utc=True, unit='ms')
Output (both UTC versions):
0
2024-02-06 05:00:00+00:00 1
2024-02-07 05:00:00+00:00 18
2024-02-08 05:00:00+00:00 1
2024-02-09 05:00:00+00:00 13
2024-02-12 05:00:00+00:00 18
2024-02-13 05:00:00+00:00 12
2024-02-14 05:00:00+00:00 155
2024-02-15 05:00:00+00:00 1