pythonpandasdataframepandas-loc

assigning an object to 1 entry of a pandas dataframe with 2 methods


I would like to share a strange behavior of pandas, and find out the reason : I assign a numpy array as an object to 1 element (cell, entry) of a pandas dataframe in 2 different ways :

first create a sample dataframe :


rn = np.random.randint(1 , 100, size=(4,2))  #  random numbers

df = pd.DataFrame(data=rn , columns=['a' , 'b' ])

df['b'] = df['b'].astype(object)  # setting 1 column's data-type as 'object'.

c = np.array([1,4,4])    # I want to put this in 1 entry of the dataframe :

method 1 :

df['b'].loc[0] = c

successful, but there is a warning :

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

method 2 :

df.loc[0 , 'b'] = c

unsuccessful with the following error :

ValueError: Must have equal len keys and value when setting with an iterable

Why is that ?


Solution

  • This is a quirk (not to say an inconsistency) in Pandas indexing. It sees one single slot on the left side of the assignation and an iterable of 3 values on the right side and chokes on that.

    The only way I have found is to force Pandas to use an index:

    df.loc[0:0, 'b' ] = pd.Series([c], [0])
    

    It gives neither error nor warning and you can control the result:

    print(type(df.loc[0, 'b']), df, sep='\n')
    

    which gives as expected:

    <class 'numpy.ndarray'>
        a          b
    0  45  [1, 4, 4]
    1  65         68
    2  68         10
    3  84         22
    

    Rather hacky, but at least provides the expected behaviour...