pythonurlweb-scrapingpython-newspaper

limiting the URL output from newspaper


I'm using newspaper3 to extract URLs from news.google, but the problem is I keep getting all the URLs (I've disabled memoize because I need the full list). I would like to only print the top 5 links or 5 random links doesn't really matter. I've tried setting a max, but that didn't work. Any ideas?

import newspaper

news = newspaper.build('https://news.google.com/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGx6TVdZU0FtVnVHZ0pWVXlnQVAB?oc=3&ceid=US:en', memoize_articles=False)

for article in news.articles:
    print(article.url)

Solution

  • This code snippet should be exactly what you want. It doesn't use a newspaper function but rather random to select a certain number of urls. The output from newspaper isn't a list therefore it has to be converted into a list using the append function. Enjoy!

    import newspaper
    
    business_news = newspaper.build('https://news.google.com/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGx6TVdZU0FtVnVHZ0pWVXlnQVAB?hl=en-US&gl=US&ceid=US%3Aen', language='en', memoize_articles = False)
    
    myList = []
    for article in business_news.articles:
        myList.append(str(article.url))
    print(myList) #not necessary just for display purposes
    
    import random
    
    aselect = myList
    randarticles = random.sample(aselect, 5)
    
    print(randarticles)