I'm trying to collect the top five search queries for each trend for the past year by category on Google Trends.
I don't know if I should do this using a python library such as pytrends
, which from their docs require a keyword to be able to query GT, or I don't have any specific keyword, I want to fetch any search query for a term on every category that can be found.
Use a scraping library Selenium
or Beautifulsoup4
to collect this information directly from the GT website.
The goal of this is to be able to retrieve the top 5 websites for each query later ...
Which direction should I take?
It is better to use one of the unofficial APIs.
These connect to the Google internal APIs that power the Trends UI with structured information. But scraping would only return mostly unstructured HTML, and you would need to extract the structured data yourself. This information will not be as reliable or as complete.
It is the difference between talking through an API that is intended for "machine to machine" communication, vs a web UI that is intended for "machine to human" interactions.