I have a code which fetches me titles of news articles in webpages. I have used a for loop in which I get the titles of 4 news websites. I have also implemented a word search which tells the number of articles in which the word " coronavirus" is used. I want the word search such that it tells me the number of articles with the word "coronavirus" in each website. Right now I'm getting the output of the number of times the word "coronavirus" is used in all the websites put together. Please help me, I have to submit this project shortly. Following is the code:
from bs4 import BeautifulSoup
from bs4.dammit import EncodingDetector
from newspaper import Article
import requests
URL=["https://www.timesnownews.com/coronavirus","https://www.indiatoday.in/coronavirus", "https://www.ndtv.com/coronavirus?pfrom=home-mainnavigation"]
for url in URL:
parser = 'html.parser'
resp = requests.get(url)
http_encoding = resp.encoding if 'charset' in resp.headers.get('content-type', '').lower() else None
html_encoding = EncodingDetector.find_declared_encoding(resp.content, is_html=True)
encoding = html_encoding or http_encoding
soup = BeautifulSoup(resp.content, parser, from_encoding=encoding)
links = []
for link in soup.find_all('a', href=True):
if "javascript" in link["href"]:
continue
links.append(link['href'])
count = 0
for link in links:
try:
article = Article(link)
article.download()
article.parse()
print(article.title)
if "COVID" in article.title or "coronavirus" in article.title or "Coronavirus"in article.title or "Covid-19" in article.title or "COVID-19" in article.title :
count += 1
except:
pass
print(" number of articles with the word COVID:")
print(count)
Actually you are getting only the last site count. If you want to get then all, append it to a list, then you can print the count for each site.
First create an empty list and append the final count each iteration:
URL = ["https://www.timesnownews.com/coronavirus", "https://www.indiatoday.in/coronavirus",
"https://www.ndtv.com/coronavirus?pfrom=home-mainnavigation"]
Url_count = []
for url in URL:
parser = 'html.parser'
...
...
except:
pass
Url_count.append(count)
Then you can use zip
to print the results:
for url, count in zip(URL, Url_count):
print("Site:", url, "Count:", count)