pythondataframefor-loop

Python How to Select to List All DataFrame Rows where Column has a NaN Entry


I have a DataFrame (20k rows) with 2 columns I would like to update if the first column (latitude) row entry is NaN. I wanted to use the code below as it might be a fast way of doing it, but I'm not sure how to update this line msk = [isinstance(row, float) for row in df['latitude'].tolist()] to get the rows that are NaN only. The latitude column I am doing the check on is float, so this line of code returns all rows.

def boolean_mask_loop(df):

    msk = [isinstance(row, float) for row in df['latitude'].tolist()]

    out = []
    for target in df.loc[msk, 'address'].tolist():
        dict_temp = geocoding(target)
        out.append([dict_temp['lat'], dict_temp['long']])

    df.loc[msk, ['latitude', 'longitude']] = out
    
    return df
id address latitude longitude
1 addr1 NaN NaN
2 addr2 NaN NaN
3 addr3 40.7526 -74.0016

Solution

  • I amended this line of code msk = [isinstance(row, float) for row in df['latitude'].tolist()] to msk = df['latitude'].isnull().tolist() which essentially identifies those rows that have NaN values in the latitude column that I wanted to update. Thank you for the advice from @Anerdw.