arrayspandasdataframenumpynumpy-ndarray

Pandas dataframe assign nested list not working


I'm trying to assign a dataframe cell with a nested list:

df.loc['y','A'] = [[2]]

However, the actual assigned value is [2].

It works expected for [2], [[[2]]], [[[[2]]]], but just not for [[2]]

See the following code:

import pandas as pd
import numpy as np

df = pd.DataFrame({"A": [[1], [[2]], [[[3]]], [[[[4]]]], np.array([[2]]), np.array([[[2]]]), [[1],[2]]], 
                   "B": [[1], [[2]], [[[3]]], [[[[4]]]], np.array([[2]]), np.array([[[2]]]), [[1],[2]]],
                   "C": [1,2,3,4,5,6,7]
                   }, 
                   index=["x", "y", "z", "w","a","b","c"])


# initial assing works
print(df)

df.loc['x','A'] = [1] # good
df.loc['y','A'] = [[2]] # buggy, actual assigned value [2]
df.loc['z','A'] = [[[3]]] # good
df.loc['w','A'] = [[[[4]]]] #good

df.loc['a','A'] = np.array([[2]], dtype=object) # buggy, actual assign value [2]
df.loc['b','A'] = np.array([[[2]]], dtype=object) # good


#df.loc['b','A'] = [1,2] # error: Must have equal len keys and value when setting with an iterable
df.loc['c','A'] = [[1],[2]] # buggy, actual assigned value [1,2]

print(df)

The output:

            A           B  C
x         [1]         [1]  1
y       [[2]]       [[2]]  2
z     [[[3]]]     [[[3]]]  3
w   [[[[4]]]]   [[[[4]]]]  4
a       [[2]]       [[2]]  5
b     [[[2]]]     [[[2]]]  6
c  [[1], [2]]  [[1], [2]]  7
           A           B  C
x        [1]         [1]  1
y        [2]       [[2]]  2
z    [[[3]]]     [[[3]]]  3
w  [[[[4]]]]   [[[[4]]]]  4
a        [2]       [[2]]  5
b    [[[2]]]     [[[2]]]  6
c     [1, 2]  [[1], [2]]  7

What is even more strange is, if we remove the col "C" , there will be no buggy, no error in all the code comments above.


Solution

  • Pandas automatically unpacks single-element lists due to type inference. This issue is influenced by column "C" (integer dtype), which alters dtype behavior.

    Force dtype=object on column "A":

    df["A"] = df["A"].astype(object)
    

    This prevents implicit unpacking of nested lists.

    Use .at[...] instead of .loc[...] for single assignments:

    df.at['y', 'A'] = [[2]]
    

    .at[...] avoids Pandas' internal dtype inference.

    Remove column "C" if not needed. Its presence influences automatic type conversion.