pythonpandaspandas-loc

Assign subset of rows to a Dataframe using loc / iloc


I am currently having a database with a set of columns X. I am willing to "update" , using .loc (or .iloc) the content of a row for a certain subset of columns (which we can call Y) from X, but the updated row is then filled with NaN and i'm trying to understand why.

a = pd.DataFrame({'id': [1, 2, 10, 12],
     'val1': ['a', 'b', 'c', 'd'],
     'val2': ['e', 'f', 'g', 'h']})

my_row = pd.DataFrame({'id': [7],
     'val1': ['z']})

index = a[a.id == 2].index

a.loc[index, ['id','val1']] = my_row

I also tried:

a .iloc[index, Y_index] = row

with Y_index containing the index of ['id','val1'], my_row is a Dataframe with the "new content" I want to assign and contains only the columns in Y.

But even though both doesn't return an error, the updated row is then filled with NaN.

I have tried to assign a single value (like an int and not a DataFrame) and it worked fine. I therefore think there is a way to assign to each column its corresponding value but I cannot find how. Does anyone has an idea ?

EDIT : It seems to be something related to index; If i change my code for this :

index = a[a.id == 1].index

Then the operation is a success. The only difference i am seeing in this in this case, my_row and a.loc[index, ['id','val1']] have the exact same index But this doesn't really help me understanding why and how this is happening


Solution

  • Update the index of the new row to match the original index and then use loc:

    a.loc[index, my_row.columns] = my_row.set_index(index)
    
    >>> a
       id val1 val2
    0   1    a    e
    1   7    z    f
    2  10    c    g
    3  12    d    h