pandasdataframepython-datetime

Replacing NaN in dataframe with datetime index by dict doesn't work


I have a dict with hourly measurement data where some entries are missing (gaps). My current approach is to create a dataframe with an hourly datetime index and prefilled with NaN. Then replace the values in the dataframe by the gasDict (see below). The dataframe is later interpolated to get rid of the NaNs.

import pandas as pd
import numpy as np

dataRange = pd.date_range(pd.to_datetime('2023-01-01 01:00:00'), pd.to_datetime('2023-01-01 05:00:00'), freq='H')
df = pd.DataFrame(np.nan, index=dataRange, columns=['gas'])
df['gas'] = pd.to_numeric(df['gas'], errors='coerce')

gasDict = {'2023-01-01 01:00:00' : 40,
           '2023-01-01 03:00:00' : 20  
          }

# these 3 methods do not work here
# methods from stackoverflow remap-values-in-pandas-column-with-a-dict-preserve-nans
df1 = df['gas'].map(gasDict).fillna(df['gas']) 
print(df1)

df2 = df['gas'].map(gasDict)
print(df2)

df3 = df.replace({'gas': gasDict})
print(df3)

# this code is correct but slow:
for key, value in gasDict.items():
    df.at[pd.to_datetime(key)] = value    

print(df) 

result (only the last one is correct!):

2023-01-01 01:00:00   NaN
2023-01-01 02:00:00   NaN
2023-01-01 03:00:00   NaN
2023-01-01 04:00:00   NaN
2023-01-01 05:00:00   NaN
Freq: H, Name: gas, dtype: float64
2023-01-01 01:00:00   NaN
2023-01-01 02:00:00   NaN
2023-01-01 03:00:00   NaN
2023-01-01 04:00:00   NaN
2023-01-01 05:00:00   NaN
Freq: H, Name: gas, dtype: float64
                     gas
2023-01-01 01:00:00  NaN
2023-01-01 02:00:00  NaN
2023-01-01 03:00:00  NaN
2023-01-01 04:00:00  NaN
2023-01-01 05:00:00  NaN
                      gas
2023-01-01 01:00:00  40.0
2023-01-01 02:00:00   NaN
2023-01-01 03:00:00  20.0
2023-01-01 04:00:00   NaN
2023-01-01 05:00:00   NaN

But that last method code is awfully slow (gasDict has about 10000 entries). What is the correct appoach?


Solution

  • Answer

    Since the index contains datetime, you need to map to the index. You did not map to the index in methods 1-3. Also, your index is of datetime type, but the key of gasDict is of string type, so you need to unify the data type for mapping.

    Code

    df['gas'] = df.index.astype('str').map(gasDict)
    

    df:

                        gas
    2023-01-01 01:00:00 40.0
    2023-01-01 02:00:00 NaN
    2023-01-01 03:00:00 20.0
    2023-01-01 04:00:00 NaN
    2023-01-01 05:00:00 NaN
    

    If your df's gas column contains non-NaN values unlike your example, so you want to use fillna, use the following code:

    df['gas'] = df['gas'].fillna(df.index.astype('str').to_frame()[0].map(gasDict))
    

    you can get same result.

    Anyway, important thing is that you should try to map your index and make the result to gas column of df.