pythonweb-scrapinggoogle-news

Scraping Google news search


I am trying to get the number of results from a google news search for a specific day. In a browser this is easy - Do a google search, click the "news" tab, click "tools", then change the time period to the date you want, then click "tools" again and you can see a count for how many stories it found.

The start and end dates can be seen in the URL. For example here is a search for "stack overflow" over the past week - https://www.google.com/search?q=stack+overflow&source=lnt&tbs=cdr%3A1%2Ccd_min%3A1%2F3%2F2018%2Ccd_max%3A1%2F10%2F2018&tbm=nws

The problem is when I try to request one of these URLs it gives me the current results for it and ignores the date range I specify. I can change these parameters around in my browser and the results change as expected, it just doesn't work programmatically.

I have tried several ways in both python and C#, always with the same results.
For example -

import requests
response = requests.get('https://www.google.com/search?q=stack+overflow&source=lnt&tbs=cdr%3A1%2Ccd_min%3A1%2F1%2F2018%2Ccd_max%3A1%2F10%2F2018&tbm=nws')
print(response.content)

Solution

  • I finally found a working method using a headless web browser and Selenium. I suppose it has something to do with not being able to get the magic created by java by a simple request. I would still be interested in hearing an explanation or other ways to do this though.