I have url of multiple websites in an xlsx file. I ran a loop on the xlsx file and passed the urls as an argument to the following sentiment analysis code. Now the code is providing me with the analysis of the whole website (the websites only contain text and numbers) but the problem is that I want to run the analysis only on the paragraph that starts with "Managerial function". How may I do the same? Here's my code:
article = Article(j)
article.download()
article.parse()
#nltk.download('punkt')
article.nlp()
text = article.summary
obj = TextBlob(text)
sentiment = obj.sentiment.polarity
print(round(sentiment,2))
if sentiment==0:
print("neutral")
elif sentiment>0:
print("positive")
elif sentiment<0:
print("negative")
Using regex
, something like the below would match a paragraph starting with "Managerial function":
found=re.search(r'^(Managerial function.*\s)', full_text, re.MULTILINE)
my_paragraph=found.group(0)
, where full_text
is your whole article text.
Remember to add this import first:
import re