I have a DataFrame (20k rows) with 2 columns I would like to update if the first column (latitude) row entry is NaN. I wanted to use the code below as it might be a fast way of doing it, but I'm not sure how to update this line msk = [isinstance(row, float) for row in df['latitude'].tolist()]
to get the rows that are NaN only. The latitude column I am doing the check on is float, so this line of code returns all rows.
def boolean_mask_loop(df):
msk = [isinstance(row, float) for row in df['latitude'].tolist()]
out = []
for target in df.loc[msk, 'address'].tolist():
dict_temp = geocoding(target)
out.append([dict_temp['lat'], dict_temp['long']])
df.loc[msk, ['latitude', 'longitude']] = out
return df
id | address | latitude | longitude |
---|---|---|---|
1 | addr1 | NaN | NaN |
2 | addr2 | NaN | NaN |
3 | addr3 | 40.7526 | -74.0016 |
I amended this line of code msk = [isinstance(row, float) for row in df['latitude'].tolist()]
to msk = df['latitude'].isnull().tolist()
which essentially identifies those rows that have NaN values in the latitude column that I wanted to update. Thank you for the advice from @Anerdw.