How to do this in pandas:
I have a function extract_text_features
on a single text column, returning multiple output columns. Specifically, the function returns 6 values.
The function works, however there doesn't seem to be any proper return type (pandas DataFrame/ numpy array/ Python list) such that the output can get correctly assigned df.ix[: ,10:16] = df.textcol.map(extract_text_features)
So I think I need to drop back to iterating with df.iterrows()
, as per this?
UPDATE:
Iterating with df.iterrows()
is at least 20x slower, so I surrendered and split out the function into six distinct .map(lambda ...)
calls.
UPDATE 2: this question was asked back around v0.11.0, before the useability of df.apply
was improved or df.assign()
was added in v0.16. Hence much of the question and answers are not too relevant since then.
Building off of user1827356 's answer, you can do the assignment in one pass using df.merge
:
df.merge(df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1})),
left_index=True, right_index=True)
textcol feature1 feature2
0 0.772692 1.772692 -0.227308
1 0.857210 1.857210 -0.142790
2 0.065639 1.065639 -0.934361
3 0.819160 1.819160 -0.180840
4 0.088212 1.088212 -0.911788
EDIT: Please be aware of the huge memory consumption and low speed: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !