I have a dict with hourly measurement data where some entries are missing (gaps). My current approach is to create a dataframe with an hourly datetime index and prefilled with NaN. Then replace the values in the dataframe by the gasDict (see below). The dataframe is later interpolated to get rid of the NaNs.
import pandas as pd
import numpy as np
dataRange = pd.date_range(pd.to_datetime('2023-01-01 01:00:00'), pd.to_datetime('2023-01-01 05:00:00'), freq='H')
df = pd.DataFrame(np.nan, index=dataRange, columns=['gas'])
df['gas'] = pd.to_numeric(df['gas'], errors='coerce')
gasDict = {'2023-01-01 01:00:00' : 40,
'2023-01-01 03:00:00' : 20
}
# these 3 methods do not work here
# methods from stackoverflow remap-values-in-pandas-column-with-a-dict-preserve-nans
df1 = df['gas'].map(gasDict).fillna(df['gas'])
print(df1)
df2 = df['gas'].map(gasDict)
print(df2)
df3 = df.replace({'gas': gasDict})
print(df3)
# this code is correct but slow:
for key, value in gasDict.items():
df.at[pd.to_datetime(key)] = value
print(df)
result (only the last one is correct!):
2023-01-01 01:00:00 NaN
2023-01-01 02:00:00 NaN
2023-01-01 03:00:00 NaN
2023-01-01 04:00:00 NaN
2023-01-01 05:00:00 NaN
Freq: H, Name: gas, dtype: float64
2023-01-01 01:00:00 NaN
2023-01-01 02:00:00 NaN
2023-01-01 03:00:00 NaN
2023-01-01 04:00:00 NaN
2023-01-01 05:00:00 NaN
Freq: H, Name: gas, dtype: float64
gas
2023-01-01 01:00:00 NaN
2023-01-01 02:00:00 NaN
2023-01-01 03:00:00 NaN
2023-01-01 04:00:00 NaN
2023-01-01 05:00:00 NaN
gas
2023-01-01 01:00:00 40.0
2023-01-01 02:00:00 NaN
2023-01-01 03:00:00 20.0
2023-01-01 04:00:00 NaN
2023-01-01 05:00:00 NaN
But that last method code is awfully slow (gasDict has about 10000 entries). What is the correct appoach?
Answer
Since the index contains datetime, you need to map to the index. You did not map to the index in methods 1-3. Also, your index is of datetime type, but the key of gasDict is of string type, so you need to unify the data type for mapping.
Code
df['gas'] = df.index.astype('str').map(gasDict)
df:
gas
2023-01-01 01:00:00 40.0
2023-01-01 02:00:00 NaN
2023-01-01 03:00:00 20.0
2023-01-01 04:00:00 NaN
2023-01-01 05:00:00 NaN
If your df's gas column contains non-NaN values unlike your example, so you want to use fillna
, use the following code:
df['gas'] = df['gas'].fillna(df.index.astype('str').to_frame()[0].map(gasDict))
you can get same result.
Anyway, important thing is that you should try to map your index and make the result to gas column of df.