pythonpandaspandas-datareader

Replacing the missing value Using Pandas


{'Country': 'USA', 'Age': '52', 'Sal': '12345', 'OnWork': 'No'}
{'Country': 'UK', 'Age': '23', 'Sal': '1142', 'OnWork': 'Yes'}
{'Country': 'MAL', 'Age': '25', 'Sal': '4456', 'OnWork': 'No'}
{'Country': 'MAL', 'Age': '25', 'Sal': '4456', 'OnWork': 'No'}
{'Country': 'MAL', 'Age': '?', 'Sal': '2345', 'OnWork': 'Yes'}
{'Country': 'MAL', 'Age': '25', 'Sal': '3342', 'OnWork': 'Yes'}
{'Country': 'MAL', 'Age': '25', 'Sal': '3452', 'OnWork': 'No'}
{'Country': 'MAL', 'Age': '?', 'Sal': '3562', 'OnWork': 'No'}

Here I have to replace the missing mean value bases on "OnWork" value. Group Yes and its mean value go to Row5 Age. Group NO and its value should go to the Last row.

df = pd.read_csv("Mycal.csv", na_values = missing_values, nrows=50)

Find and replace the Mean value (This is working)

df["F8"].fillna(df['F8'].mean(), inplace=True)

here I am able to find the Mean value, However I am not able to replace it.

df[df["Class"]=="Yes"]["F8"].mean()

I am expecting the Yes values should group and fill Missing value the Mean to fill same for NO. Kindly help me with this


Solution

  • Use mask and fillna as:

    df['Age'] = df['Age'].mask(df['Age'].eq('?'), np.nan).astype(float)
    df['Age'] = (df['Age'].fillna(df.groupby('OnWork')['Age'].transform(np.nanmean))
                          .astype(int))
    
    print(df)
      Country  Age    Sal OnWork
    0     USA   52  12345     No
    1      UK   23   1142    Yes
    2     MAL   25   4456     No
    3     MAL   25   4456     No
    4     MAL   24   2345    Yes
    5     MAL   25   3342    Yes
    6     MAL   25   3452     No
    7     MAL   31   3562     No
    

    If you want to replace multiple column values at once use:

    df = df.fillna(df.groupby('OnWork').transform('mean'))