I have a pandas dataframe like this:
Text start end entity value
I love apple 7 11 fruit apple
I ate potato 6 11 vegetable potato
I have tried to use a for loop It's running slow and I don't think this is what we should do with pandas.
I want to create another pandas dataframe base on this like:
Sentence# Word Tag
1 I Object
1 love Object
1 apple fruit
2 I Object
2 ate Object
2 potato vegetable
Split the text column into words and sentence numbers. Other than the entity word, the other words will be tagged as Object.
Use split
, stack
and map
:
u = df.Text.str.split(expand=True).stack()
pd.DataFrame({
'Sentence': u.index.get_level_values(0) + 1,
'Word': u.values,
'Entity': u.map(dict(zip(df.value, df.entity))).fillna('Object').values
})
Sentence Word Entity
0 1 I Object
1 1 love Object
2 1 apple fruit
3 2 I Object
4 2 ate Object
5 2 potato vegetable
Side note: If running v0.24 or later, please use .to_numpy()
instead of .values
.