I have 2 dataframes, one that contains data and one that contains exlusions that need to be merged onto the data and marked as included(True or False). I have been doing this as follows for a couple of years by simply adding a new column to the exclusions data frame and setting everything to True, then merging that onto the main dataframe which results in the additional column containing either True or NaN. Finally I run a pd.fillna to replace all the NaN values with False and I'm good to go.
import pandas as pd
MainData = {'name': ['apple', 'pear', 'orange', 'watermelon'],
'other': ['blah' , 'blah', 'blah' , 'blah']}
dfMainData = pd.DataFrame(MainData)
Exclusions = {'name': ['pear' , 'watermelon'],
'reason': ['pears suck', 'too messy!']}
dfExclusions = pd.DataFrame(Exclusions)
dfExclusions['excluded'] = True
dfMainData = pd.merge(dfMainData, dfExclusions, how='left', on='name')
dfMainData['excluded'] = dfMainData['excluded'].fillna(False)
I was previously running pands 1.2.4 but am making code updates and am migrating to 2.2.1 as part of this, and am now receiving the following warning:
dfMainData['excluded'] = dfMainData['excluded'].fillna(False) :1: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set
pd.set_option('future.no_silent_downcasting', True)
It still technically works but it appears that this is NOT the pandas'esque way of doing things, so I am curious how I should be going about this now to avoid compatibility issues in the future?
You can use (for pandas 2.2.1 etc) :
dfMainData['excluded'] = dfMainData['excluded'].fillna(0).astype('bool')
which gives
name other reason excluded
0 apple blah NaN False
1 pear blah pears suck True
2 orange blah NaN False
3 watermelon blah too messy! True