pythongoogle-trends

Confusion understanding Google Trends with pytrends: column normalization, search representation, and monthly aggregation


The following code

from pytrends.request import TrendReq

pytrends = TrendReq(hl='it', tz=0, timeout=None)

keywords = ['dichiarazione', 'redditi']

pytrends.build_payload(keywords, timeframe='2004-01-01 2017-01-01', geo='IT')

pytrends.interest_by_region(resolution='COUNTRY', inc_low_vol=True, inc_geo_code=False)

returns

enter image description here

I don't have a clear understanding of the results I'm obtaining. Firstly, should the column numbers not be normalized between 0 and 100, as is usually done for Google Trends? What do they represent? Why, for example, does the last column have much lower numbers than the one next to it? I wanted to focus on searches containing both the words 'dichiarazione' and 'redditi' in Italian, but I'm starting to doubt whether this code actually returns results for searches containing each of the words 'dichiarazione' and 'redditi' separately (i.e., in the second column, searches containing the single word 'dichiarazione,' and in the third column, searches containing the single word 'redditi'). Is that really the case? Additionally, I would like to obtain monthly results of searches within the specified time frame (so I would like to see this dataframe repeated for each month of the time frame), but I don't know where to start. Any suggestions, please?


Solution

  • Here is the code that achieves what you want if I understood your request correctly.

    from pytrends.request import TrendReq
    import pandas as pd
    import datetime
    
    # Set up pytrends
    pytrends = TrendReq(hl='en-US', tz=360)
    
    # Set keywords and location
    keywords = ['dichiarazione', 'redditi']
    geo_location = 'IT'
    
    # Get timeframe (last 5 years)
    end_date = datetime.datetime.now()
    start_date = end_date - datetime.timedelta(days=5 * 365)
    
    # Convert dates to string format
    start_date_str = start_date.strftime('%Y-%m-%d')
    end_date_str = end_date.strftime('%Y-%m-%d')
    
    # Build the payload
    pytrends.build_payload(keywords, cat=0, timeframe=f'{start_date_str} {end_date_str}', geo=geo_location, gprop='')
    
    # Get interest over time
    interest_over_time_df = pytrends.interest_over_time()
    
    # Resample the data to get monthly values
    monthly_interest_df = interest_over_time_df.resample('M').sum()
    
    # Display the results
    print(monthly_interest_df)
    
    # Plot the data
    import matplotlib.pyplot as plt
    monthly_interest_df.plot()
    plt.show()
    

    Here are the results:

    ~/Projects/Trends ❯ python3 trends.py                                                                     20s 09:23:36
                dichiarazione  redditi  isPartial
    date
    2018-11-30             40       12          0
    2018-12-31            169       43          0
    2019-01-31            178       48          0
    2019-02-28            170       44          0
    2019-03-31            237       82          0
    ...                   ...      ...        ...
    2023-07-31            268      132          0
    2023-08-31            110       53          0
    2023-09-30            203       79          0
    2023-10-31            226       80          0
    2023-11-30             94       38          1
    
    [61 rows x 3 columns]
    

    And here is the graph:

    enter image description here