I want to identify if a column in pandas is a list (in each row).
df=pd.DataFrame({'X': [1, 2, 3], 'Y': [[34],[37,45],[48,50,57]],'Z':['A','B','C']})
df
Out[160]:
X Y Z
0 1 [34] A
1 2 [37, 45] B
2 3 [48, 50, 57] C
df.dtypes
Out[161]:
X int64
Y object
Z object
dtype: object
Since the dtype of strings is "object", I'm unable to distinguish between columns that are strings and lists (of integer or strings).
How do I identify that column "Y" is a list of int?
If your dataset is big, you should take a sample before apply the type function, then you can check:
If the the most common type is list:
df\
.sample(100)\
.map(type)\ # use .applymap(type) prior to v2.1.0
.mode(0)\
.astype(str) == "<class 'list'>"
If all values are list:
(df\
.sample(100)\
.map(type)\ # use .applymap(type) prior to v2.1.0
.astype(str) == "<class 'list'>")\
.all(0)
If any values are list:
(df\
.sample(100)\
.map(type)\ # use .applymap(type) prior to v2.1.0
.astype(str) == "<class 'list'>")\
.any(0)