python-3.xpandasgroup-byseries

why does vs code returns error - AttributeError: 'int' object has no attribute 'where', but same code runs without any issue on Google Colab


The following code keeps returning an AttributeError in vs code but the same code when run on Google Colab, produces no such error:

The code:

import numpy as np
import pandas as pd

url = 'https://github.com//mattharrison/datasets/raw/master/data/alta-noaa-1980-2019.csv'

alta_df = pd.read_csv(url)

dates = pd.to_datetime(alta_df.DATE)

snow = alta_df.SNOW.rename(dates)

def season(idx):
    year = idx.year
    month = idx.month
    return year.where((month<10), year+1)

snow.groupby(season).sum()

The Error:

AttributeError                            Traceback (most recent call last)
File

    388 year = idx.year
    389 month = idx.month
--> 390 return year.where((month<10), year+1)

AttributeError: 'int' object has no attribute 'where'

My understanding is that since I am calling the season() function as param for the chained groupby function, where() function should have been able to get the year from the snow object. But somehow that is not happening.

Just to make sure that there is no syntax error in my code, ran this code on Google Colab and there I did not face any such issues. I have attached an screenshot of the output from the Google Colab for your perusal:

Screenshot from the Google Colab

I have also gone through all the available solutions for AttributeError on this platform but could not find any solutions where this error was restricted only to VS Code and not to the Google Colab or Juputer Notebook terminal.


Solution

  • When groupby takes a function, it calls it on each value, this is not vectorized.

    by: mapping, function, label, pd.Grouper or list of such

    Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.

    You can use instead:

    snow.groupby(season(snow.index)).sum()
    

    Or make your function non-vectorized:

    
    def season(idx):
        year = idx.year
        month = idx.month
        return year if month<10 else year+1
    
    snow.groupby(season).sum()
    

    Output:

    1980    457.5
    1981    503.0
    1982    842.5
    1983    807.5
    ...
    2017    524.0
    2018    308.8
    2019    504.5
    Name: SNOW, dtype: float64
    

    Alternatively, resample:

    snow.resample('Y-SEP').sum()
    
    1980-09-30    457.5
    1981-09-30    503.0
    1982-09-30    842.5
    ...
    2017-09-30    524.0
    2018-09-30    308.8
    2019-09-30    504.5
    Freq: YE-SEP, Name: SNOW, dtype: float64