If on column A I have the ID number of people, and column B I have their names, but I have many rows where the ID is entered but the name is missing, assuming, same person, same Id and supposedly same name.
How can I to locate null on column B, grab correspondent value on A(ID#), compare and find the name ID and them grab the name(value on B) and fill the null value? Like:
A | B |
---|---|
56 | Michael |
34 | Paula |
79 | Davi |
80 | Luna |
56 | NaN |
So I want a code to identify the NaN
, grab the A column value 56, search for another 56 in the same column and fill NaN
with the correspondent value, imagine this dataframe is gigantic, so I can't just group by A and replace one by one
As said in the comment, ffill
can work. But if NaN
value appears first in the group, it won't change that NaN
. So sort df by 'B' first and then do ffill
like:
df.sort_values('B', ignore_index=True, inplace=True)
df['B'] = df.groupby('A')['B'].ffill()
Or you can find what is the real name of each id by dropping NaN and map it onto the 'A' column:
df['B'] = df['A'].map(df.groupby('A').agg(lambda x:x.dropna().unique())['B'])
Although ffill might be more performant without considering sort.