pythonweb-scrapinggoogle-news

I only want to print 1 story from GoogleNews


I am currently trying to figure out how i could print 1 single story from GoogleNews, now i also have to mention that i am Web Scraping it which makes it even more difficult(i guess). I also tried to google it, but i couldn't really find anything on the internet. So here is my code:

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen

news_url = "https://news.google.com/rss?hl=de&gl=DE&ceid=DE:de"
Client = urlopen(news_url)
xml_page = Client.read()
Client.close()

soup_page = soup(xml_page, "xml")
news_list = soup_page.findAll("item")

for news in news_list:
    print(news.title.text)
    print(news.link.text)
    print(news.pubDate.text)

So when i run this code, it returns a bunch of stories from today, but i only want to print out 1. story. Is there any way to do that?


Solution

  • You can do that using find method as follows:

    from bs4 import BeautifulSoup as soup
    from urllib.request import urlopen
    
    news_url = "https://news.google.com/rss?hl=de&gl=DE&ceid=DE:de"
    Client = urlopen(news_url)
    xml_page = Client.read()
    Client.close()
    
    soup_page = soup(xml_page, "xml")
    news = soup_page.find("item")
    
    #for news in news_list:
    print(news.title.text)
    print(news.link.text)
    print(news.pubDate.text)
    

    Or you can use list slicing:

    from bs4 import BeautifulSoup as soup
    from urllib.request import urlopen
    
    news_url = "https://news.google.com/rss?hl=de&gl=DE&ceid=DE:de"
    Client = urlopen(news_url)
    xml_page = Client.read()
    Client.close()
    
    soup_page = soup(xml_page, "xml")
    news_list = soup_page.findAll("item")
    
    for news in news_list[:1]:
        print(news.title.text)
        print(news.link.text)
        print(news.pubDate.text)
    

    Output:

    Corona-News-Ticker: Die meisten Ungeimpften wollen ungeimpft bleiben - NDR.de
    https://news.google.com/__i/rss/rd/articles/CBMigQFodHRwczovL3d3dy5uZHIuZGUvbmFjaHJpY2h0ZW4vaW5mby9Db3JvbmEtTmV3cy1UaWNrZXItRGllLW1laXN0ZW4tVW5nZWltcGZ0ZW4td29sbGVuLXVuZ2VpbXBmdC1ibGVpYmVuLGNvcm9uYWxpdmV0aWNrZXIxMzYyLmh0bWzSAQA?oc=5
    Thu, 28 Oct 2021 10:56:34 GMT