pythonpandasdataframenull

Add pandas dataframe column based on whether two columns are null


I am trying to add a column data_status to a dataframe that contains 'data' if either total_pounds_entered or total_pounds_estimated is not null and 'no data' if both of these columns are null. The following code isn't working for me. I know this because I see two rows that have values in the total_pounds_estimated column that end up in df instead of df_total_pounds_estimated_or_entered. Does anyone know what I'm missing here? Thanks!

df['data_status'] = ''
df[df['total_pounds_entered'].notnull()]['data_status'] = 'data'
df[df['total_pounds_estimated'].notnull()]['data_status'] = 'data'
df_total_pounds_estimated_or_entered = df[df['data_status'] == 'data']
df = df[df['data_status'] != 'data']

Solution

  • Your line df[df['...'].notnull()] creates a copy rather than modifying the content in-place. This is why your code is not working. The following two lines, (i) Pre-fill a data_status column with no data and (ii) overwrite the entries where there is data in either of your given columns.

    df['data_status'] = 'no data'
    df.loc[df['total_pounds_entered'].notnull() | df['total_pounds_estimated'].notnull(), 'data_status'] = 'data'