I have a DataFrame column consisting of lists of strings and one NaN value. I am trying to join the lists of strings while ignoring the NaN with df.loc
, Series.notnull()
, and Series.apply()
. I expect this to join each of the lists while skipping over the NaNs, but I'm receiving "TypeError: can only join an iterable."
I'm setting up my DataFrame like this:
import pandas as pd
data = {'id': [['54930058LIMFSJIOLQ48'],np.nan,['5493006B6WMKNQ8QNP51 254900425JAG3QVRMM28']]}
df = pd.DataFrame(data)
id
0 [54930058LIMFSJIOLQ48]
1 NaN
2 [5493006B6WMKNQ8QNP51 254900425JAG3QVRMM28]
This is the line I'm using to join the strings. Why is it throwing an error?
df.loc[df['id'].notnull(), 'id'] = df['id'].apply(lambda x: ', '.join(x))
The “np.nan” code you entered is the cause of the problem because NaN is not an iterable type. Therefore python issues an error message when you try to run ', '.join(x)
on NaN
If you still want to run the code, you can use isinstance
to ensure x
is a list and you can run the code.
Try this one
df['id'] = df['id'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
The code will check x
whether it is a list or not. If x
is not a list, it will return x
itself.