I am currently having a database with a set of columns X. I am willing to "update" , using .loc (or .iloc) the content of a row for a certain subset of columns (which we can call Y) from X, but the updated row is then filled with NaN and i'm trying to understand why.
a = pd.DataFrame({'id': [1, 2, 10, 12],
'val1': ['a', 'b', 'c', 'd'],
'val2': ['e', 'f', 'g', 'h']})
my_row = pd.DataFrame({'id': [7],
'val1': ['z']})
index = a[a.id == 2].index
a.loc[index, ['id','val1']] = my_row
I also tried:
a .iloc[index, Y_index] = row
with Y_index containing the index of ['id','val1'], my_row is a Dataframe with the "new content" I want to assign and contains only the columns in Y.
But even though both doesn't return an error, the updated row is then filled with NaN.
I have tried to assign a single value (like an int and not a DataFrame) and it worked fine. I therefore think there is a way to assign to each column its corresponding value but I cannot find how. Does anyone has an idea ?
EDIT : It seems to be something related to index; If i change my code for this :
index = a[a.id == 1].index
Then the operation is a success. The only difference i am seeing in this in this case, my_row and a.loc[index, ['id','val1']] have the exact same index But this doesn't really help me understanding why and how this is happening
Update the index of the new row to match the original index and then use loc
:
a.loc[index, my_row.columns] = my_row.set_index(index)
>>> a
id val1 val2
0 1 a e
1 7 z f
2 10 c g
3 12 d h