pythonpandasline-plotclustermap

TypeError: category type does not support sum operations (in pandas)


I have flight's seaborn dataset.

import seaborn as sns

flights = sns.load_dataset('flights') 
flights.groupby(['year']).sum()

when i run this, i get error like : TypeError: category type does not support sum operations

facing this issues in clusteMap and Lineplot

your Assistence will be Appreciated!


Solution

  • This snippet works in pandas 1.* but not in pandas 2.

    import seaborn as sns
    
    flights = sns.load_dataset('flights') 
    flights.groupby(['year']).sum() # Error
    

    The issue is that the month column has type category:

    flights.info(True)
    
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 144 entries, 0 to 143
    Data columns (total 3 columns):
     #   Column      Non-Null Count  Dtype   
    ---  ------      --------------  -----   
     0   year        144 non-null    int64   
     1   month       144 non-null    category
     2   passengers  144 non-null    int64   
    dtypes: category(1), int64(2)
    memory usage: 2.9 KB
    

    In pandas 1.*, the month column is automatically dropped because its type does not support the sum method.

    To get to the same result in pandas 2, you'll want to specifically select the passengers column (and any other col of interest):

    flights.groupby('year')[['passengers']].sum()
    

    yields:

          passengers
    year            
    1949        1520
    1950        1676
    1951        2042
    1952        2364
    1953        2700
    1954        2867
    1955        3408
    1956        3939
    1957        4421
    1958        4572
    1959        5140
    1960        5714