pythonpandasnumpydataframe

How to identify a pandas column is a list


I want to identify if a column in pandas is a list (in each row).

df=pd.DataFrame({'X': [1, 2, 3], 'Y': [[34],[37,45],[48,50,57]],'Z':['A','B','C']})

df
Out[160]: 
   X             Y  Z
0  1          [34]  A
1  2      [37, 45]  B
2  3  [48, 50, 57]  C

df.dtypes
Out[161]: 
X     int64
Y    object
Z    object
dtype: object

Since the dtype of strings is "object", I'm unable to distinguish between columns that are strings and lists (of integer or strings).

How do I identify that column "Y" is a list of int?


Solution

  • If your dataset is big, you should take a sample before apply the type function, then you can check:

    If the the most common type is list:

    df\
    .sample(100)\
    .map(type)\  # use .applymap(type) prior to v2.1.0
    .mode(0)\
    .astype(str) == "<class 'list'>"
    

    If all values are list:

    (df\
    .sample(100)\
    .map(type)\  # use .applymap(type) prior to v2.1.0
    .astype(str) == "<class 'list'>")\
    .all(0)
    

    If any values are list:

    (df\
    .sample(100)\
    .map(type)\  # use .applymap(type) prior to v2.1.0
    .astype(str) == "<class 'list'>")\
    .any(0)