The following code
from pytrends.request import TrendReq
pytrends = TrendReq(hl='it', tz=0, timeout=None)
keywords = ['dichiarazione', 'redditi']
pytrends.build_payload(keywords, timeframe='2004-01-01 2017-01-01', geo='IT')
pytrends.interest_by_region(resolution='COUNTRY', inc_low_vol=True, inc_geo_code=False)
returns
I don't have a clear understanding of the results I'm obtaining. Firstly, should the column numbers not be normalized between 0 and 100, as is usually done for Google Trends? What do they represent? Why, for example, does the last column have much lower numbers than the one next to it? I wanted to focus on searches containing both the words 'dichiarazione' and 'redditi' in Italian, but I'm starting to doubt whether this code actually returns results for searches containing each of the words 'dichiarazione' and 'redditi' separately (i.e., in the second column, searches containing the single word 'dichiarazione,' and in the third column, searches containing the single word 'redditi'). Is that really the case? Additionally, I would like to obtain monthly results of searches within the specified time frame (so I would like to see this dataframe repeated for each month of the time frame), but I don't know where to start. Any suggestions, please?
Here is the code that achieves what you want if I understood your request correctly.
from pytrends.request import TrendReq
import pandas as pd
import datetime
# Set up pytrends
pytrends = TrendReq(hl='en-US', tz=360)
# Set keywords and location
keywords = ['dichiarazione', 'redditi']
geo_location = 'IT'
# Get timeframe (last 5 years)
end_date = datetime.datetime.now()
start_date = end_date - datetime.timedelta(days=5 * 365)
# Convert dates to string format
start_date_str = start_date.strftime('%Y-%m-%d')
end_date_str = end_date.strftime('%Y-%m-%d')
# Build the payload
pytrends.build_payload(keywords, cat=0, timeframe=f'{start_date_str} {end_date_str}', geo=geo_location, gprop='')
# Get interest over time
interest_over_time_df = pytrends.interest_over_time()
# Resample the data to get monthly values
monthly_interest_df = interest_over_time_df.resample('M').sum()
# Display the results
print(monthly_interest_df)
# Plot the data
import matplotlib.pyplot as plt
monthly_interest_df.plot()
plt.show()
Here are the results:
~/Projects/Trends ❯ python3 trends.py 20s 09:23:36
dichiarazione redditi isPartial
date
2018-11-30 40 12 0
2018-12-31 169 43 0
2019-01-31 178 48 0
2019-02-28 170 44 0
2019-03-31 237 82 0
... ... ... ...
2023-07-31 268 132 0
2023-08-31 110 53 0
2023-09-30 203 79 0
2023-10-31 226 80 0
2023-11-30 94 38 1
[61 rows x 3 columns]
And here is the graph: