pythonpandastargetnullvalue

Find null and fill with a correspondent value based on another columns value


If on column A I have the ID number of people, and column B I have their names, but I have many rows where the ID is entered but the name is missing, assuming, same person, same Id and supposedly same name.

How can I to locate null on column B, grab correspondent value on A(ID#), compare and find the name ID and them grab the name(value on B) and fill the null value? Like:

A B
56 Michael
34 Paula
79 Davi
80 Luna
56 NaN

So I want a code to identify the NaN, grab the A column value 56, search for another 56 in the same column and fill NaN with the correspondent value, imagine this dataframe is gigantic, so I can't just group by A and replace one by one


Solution

  • As said in the comment, ffill can work. But if NaN value appears first in the group, it won't change that NaN. So sort df by 'B' first and then do ffill like:

    df.sort_values('B', ignore_index=True, inplace=True)
    df['B'] = df.groupby('A')['B'].ffill()
    

    Or you can find what is the real name of each id by dropping NaN and map it onto the 'A' column:

    df['B'] = df['A'].map(df.groupby('A').agg(lambda x:x.dropna().unique())['B'])
    

    Although ffill might be more performant without considering sort.