pythonnlptagginghuggingfaceflair

Unable to tag the POS of the text file


I want to tag the parts of speech of a sentence. For this task I am using pos-english-fast model. If there was one sentence the model identified the tags for the pos. I created a data file where I kept all my sentences. The name of the data file is 'data1.txt'. Now if I try to tag the sentences on the data file it does not work.

My code

from flair.models import SequenceTagger
model = SequenceTagger.load("flair/pos-english")
#Read the data from the data.txt 
with open('data1.txt') as f:
  data = f.read().splitlines()
#Create a list of sentences from the data 
sentences = [sentence.split() for sentence in data]
#Tag each sentence using the model
tagged_sentences = []
for sentence in sentences:
  tagged_sentences.append(model.predict(sentence))
for sentence in tagged_sentences:
  print(sentence)

The error I received

AttributeError                            Traceback (most recent call last)
<ipython-input-16-03268ee0d9c9> in <cell line: 10>()
      9 tagged_sentences = []
     10 for sentence in sentences:
---> 11   tagged_sentences.append(model.predict(sentence))
     12 for sentence in tagged_sentences:
     13   print(sentence)

1 frames
/usr/local/lib/python3.10/dist-packages/flair/data.py in set_context_for_sentences(cls, sentences)
   1116         previous_sentence = None
   1117         for sentence in sentences:
-> 1118             if sentence.is_context_set():
   1119                 continue
   1120             sentence._previous_sentence = previous_sentence

AttributeError: 'str' object has no attribute 'is_context_set'

The snapshot of the errors enter image description here

How could I resolve it?


Solution

  • Let's say this is your data:

    ['Not My Responsibility is a 2020 American short film written and produced by singer-songwriter Billie Eilish.',
     "A commentary on body shaming and double standards placed upon young women's appearances, it features a monologue from Eilish about the media scrutiny surrounding her body.",
     'The film is spoken-word and stars Eilish in a dark room, where she gradually undresses before submerging herself in a black substance.']
    

    This is what you need to do to do part-of-speech tagging in Flair:

    from flair.data import Sentence
    from flair.models import SequenceTagger
    
    sentences = list(map(Sentence, data))
    _ = model.predict(sentences)
    

    Now all sentences are correctly tagged. If you want to visualize, for example, the tags for the first sentence, just use print(sentences[0]). This is the output:

    Sentence[17]: "Not My Responsibility is a 2020 American short film written and produced by singer-songwriter Billie Eilish." →
    ["Not"/RB, "My"/PRP$, "Responsibility"/NN, "is"/VBZ, "a"/DT, "2020"/CD, "American"/JJ, "short"/JJ, "film"/NN, "written"/VBN, "and"/CC, "produced"/VBN, "by"/IN, "singer-songwriter"/NN, "Billie"/NNP, "Eilish"/NNP, "."/.]
    ``