Confusion understanding Google Trends with pytrends: column normalization, search representation, and monthly aggregation

The following code

from pytrends.request import TrendReq

pytrends = TrendReq(hl='it', tz=0, timeout=None)

keywords = ['dichiarazione', 'redditi']

pytrends.build_payload(keywords, timeframe='2004-01-01 2017-01-01', geo='IT')

pytrends.interest_by_region(resolution='COUNTRY', inc_low_vol=True, inc_geo_code=False)

returns

I don't have a clear understanding of the results I'm obtaining. Firstly, should the column numbers not be normalized between 0 and 100, as is usually done for Google Trends? What do they represent? Why, for example, does the last column have much lower numbers than the one next to it? I wanted to focus on searches containing both the words 'dichiarazione' and 'redditi' in Italian, but I'm starting to doubt whether this code actually returns results for searches containing each of the words 'dichiarazione' and 'redditi' separately (i.e., in the second column, searches containing the single word 'dichiarazione,' and in the third column, searches containing the single word 'redditi'). Is that really the case? Additionally, I would like to obtain monthly results of searches within the specified time frame (so I would like to see this dataframe repeated for each month of the time frame), but I don't know where to start. Any suggestions, please?

Solution

Here is the code that achieves what you want if I understood your request correctly.

from pytrends.request import TrendReq
import pandas as pd
import datetime

# Set up pytrends
pytrends = TrendReq(hl='en-US', tz=360)

# Set keywords and location
keywords = ['dichiarazione', 'redditi']
geo_location = 'IT'

# Get timeframe (last 5 years)
end_date = datetime.datetime.now()
start_date = end_date - datetime.timedelta(days=5 * 365)

# Convert dates to string format
start_date_str = start_date.strftime('%Y-%m-%d')
end_date_str = end_date.strftime('%Y-%m-%d')

# Build the payload
pytrends.build_payload(keywords, cat=0, timeframe=f'{start_date_str} {end_date_str}', geo=geo_location, gprop='')

# Get interest over time
interest_over_time_df = pytrends.interest_over_time()

# Resample the data to get monthly values
monthly_interest_df = interest_over_time_df.resample('M').sum()

# Display the results
print(monthly_interest_df)

# Plot the data
import matplotlib.pyplot as plt
monthly_interest_df.plot()
plt.show()

Here are the results:

~/Projects/Trends ❯ python3 trends.py                                                                     20s 09:23:36
            dichiarazione  redditi  isPartial
date
2018-11-30             40       12          0
2018-12-31            169       43          0
2019-01-31            178       48          0
2019-02-28            170       44          0
2019-03-31            237       82          0
...                   ...      ...        ...
2023-07-31            268      132          0
2023-08-31            110       53          0
2023-09-30            203       79          0
2023-10-31            226       80          0
2023-11-30             94       38          1

[61 rows x 3 columns]

And here is the graph: