pythonpandaspivot-tableint64

Why does Pivot table return Int64 Type Error?


I'm trying to pivot a dataframe but it keeps returning an Int64 Error. A similar question was not actually answered - What causes these Int64 columns to cause a TypeError?

Here's the type of my dataframe:

#   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   price         30159 non-null  Int64 
 1   type          30159 non-null  object
 2   size          30155 non-null  Int64 
 3   location      30159 non-null  object
 4   neighborhood  30159 non-null  object
dtypes: Int64(2), object(3)

the pivot table code:

pfraw = pd.pivot_table(pfraw, values = 'price', index = 'neighborhood', columns = 'type')

and the lat bit of the error message:

    273     dtype = np.dtype(dtype)
    275 if not isinstance(dtype, np.dtype):
    276     # enforce our signature annotation
--> 277     raise TypeError(dtype)  # pragma: no cover
    279 converted = maybe_downcast_numeric(result, dtype, do_round)
    280 if converted is not result:

TypeError: Int64

I dont understand why would it return an error with Int64.


Solution

  • First of all, let's create a df similar to the one OP has

    import pandas as pd
    
    df = pd.DataFrame( {'price': [10, 12, 18, 10, 12], 'type': ['A', 'A', 'A', 'B', 'B'], 'size': [10, 12, 18, 10, 12], 'location': ['A', 'A', 'A', 'B', 'B'], 'neighborhood': ['A', 'A', 'A', 'B', 'B']})
    

    If one prints the df one will see that this one has int64 and not Int64 (as opposed to OP's). Note: On my answer here one finds the difference between the two dtypes.

    print(df.info(verbose=True))
    
    [Out]:
    
     #   Column        Non-Null Count  Dtype 
    ---  ------        --------------  ----- 
     0   price         5 non-null      int64 
     1   type          5 non-null      object
     2   size          5 non-null      int64 
     3   location      5 non-null      object
     4   neighborhood  5 non-null      object
    

    And, with an int64 one will be able to create the pivot table with index "neighborhood", columns "type", and values "price", with the following

    df_pivot = df.pivot_table(index='neighborhood', columns='type', values='price')
    

    This is the output

    type                  A     B
    neighborhood                 
    A             13.333333   NaN
    B                   NaN  11.0
    

    However, with Int64 the Pivot Table can generate an error.

    In order to handle that, one will need convert the type to int64

    df[['price', 'size']] = df[['price', 'size']].astype('int64')  
    

    or

    import numpy as np
    
    df[['price', 'size']] = df[['price', 'size']].astype(np.int64)
    

    Also, most likely, OP has missing values. The fastest way to handle that is to remove the rows with missing values. In order to find and remove the missing values, my answer here may be of help.

    For the reference, this is a direct link to the module maybe_downcast_to_dtype that is raising the error that OP is having.