pythonpandasmissing-data

How I can fillna value based on a different categorical column


I would like to fill the NaN values of column Partner_salary with 0 where Partner_working is 'No' and set the remaining NaN values with the mean of Partner_salary column.

pd.DataFrame({
    'Partner_working': ['Yes','No','Yes','Yes','No'],
    'Partner_salary': [np.NaN,np.NaN,1500,1000,0]})

I have tried to use the loc function to slice the data, but I am not able to continue to the next step

data.loc[data['Partner_salary'].isnull()==True,'Partner_working'].value_counts()

Output:

No 90,Yes 16

Solution

  • @Rishabh_KT way of appling a function might be easier to read. If you want to stay with .loc logic, here is another way

    # create example df
    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({
        'Partner_working': ['Yes', 'No', 'Yes', 'Yes', 'No'],
        'Partner_salary': [np.NaN, np.NaN, 1500, 1000, 0]
    })
    
    
    # Update 'Partner_salary' to 0 where 'Partner_working' is "No"
    df.loc[df['Partner_working'] == "No", 'Partner_salary'] = 0
    
    
    # Calculate the mean of non-null 'Partner_salary'
    mean = df['Partner_salary'].loc[~df['Partner_salary'].isnull()].mean()
    
    # Fill NaN 'Partner_salary' with the mean
    df['Partner_salary'].fillna(mean, inplace=True)