I am trying to add a column data_status
to a dataframe that contains 'data' if either total_pounds_entered
or total_pounds_estimated
is not null and 'no data' if both of these columns are null. The following code isn't working for me. I know this because I see two rows that have values in the total_pounds_estimated
column that end up in df
instead of df_total_pounds_estimated_or_entered
. Does anyone know what I'm missing here? Thanks!
df['data_status'] = ''
df[df['total_pounds_entered'].notnull()]['data_status'] = 'data'
df[df['total_pounds_estimated'].notnull()]['data_status'] = 'data'
df_total_pounds_estimated_or_entered = df[df['data_status'] == 'data']
df = df[df['data_status'] != 'data']
Your line df[df['...'].notnull()]
creates a copy rather than modifying the content in-place. This is why your code is not working. The following two lines, (i) Pre-fill a data_status
column with no data
and (ii) overwrite the entries where there is data in either of your given columns.
df['data_status'] = 'no data'
df.loc[df['total_pounds_entered'].notnull() | df['total_pounds_estimated'].notnull(), 'data_status'] = 'data'