I have a data frame with pos values for each document split down into single tokens. How can I merge the individual pos values into one single cell separated by a comma? So now I have something like
doc_id sentence_id token_id token pos entity
1 text1 1 1 xxxxxx PRON
2 text1 1 2 xxxx AUX
3 text1 1 3 xxx AUX
4 text1 1 4 xxxxxxx VERB
5 text2 1 5 xxxx DET
6 text2 1 6 xxx NOUN
How can I make it into
doc_id pos entity
1 text1 PRON, AUX, AUX, VERB...
2 text2 AUX, NOUN, PRON, ADJ...
3 text3 ...
4 text4 ...
5 text5 ...
6 text6 ...
Do I need to create a new data frame or is there a Spacy function that can do this directly? Thank you
You can collapse it like so:
aggregate(pos ~ doc_id, doc_df, paste, collapse = ", ")
You can store this in a separate dataframe and merge in any other columns you want to include from the original, or if you just need these two then you can use this directly.