pythonpandasdataframebinning

Binning error for a dataframe column - KeyError: "None of [Float64Index([61.5, 59.8, 56.8.... dtype='float64', length=53940)] are in the [columns]"


I wrote a function for binning the numerical values of a dataframe column, i.e., dividing the column values into the specified number of categories.

def binning_fun(df, col_name, num_of_bins):
    lt=[]
    for i in range(0,num_of_bins):
        lt.append(i)
        df[col_name]=pd.cut(df[col_name],bins=i+1, labels=lt)
    return df

df="C:/Users/shootings.csv"
binning_fun(df, df['depth'], 4)

This gives the following error:

"None of [Float64Index([61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4,\n ...\n 60.5, 59.8, 60.5, 61.2, 62.7, 60.8, 63.1, 62.8, 61.0, 62.2],\n dtype='float64', length=53940)] are in the [columns]"

These values do exist in the column 'depth'. Why are they being called inexistent?

My dataset:

        carat   cut     clarity  depth  table   
0       0.23    Ideal       SI2  61.5   55.0    
1       0.21    Premium     SI1  59.8   61.0    
2       0.23    Good        VS1  56.9   65.0    
3       0.29    Premium     VS2  62.4   58.0    
4       0.31    Good        SI2  63.3   58.0    
5       0.24    Good        VVS2 90.7   62.8    

Expected output:

depth
1
0
0
1
1
2

Solution

  • You can use cut for fixed bin sizes:

    def binning_fun(df, col_name, num_of_bins):
        df[col_name]=pd.cut(df[col_name], bins=num_of_bins, labels=range(num_of_bins))
        return df
    
    df = pd.read_csv("C:/Users/shootings.csv")
    binning_fun(df, 'depth', 4)
    

    Output:

        carat   cut     clarity  depth   table
    0   0.23    Ideal   SI2      0       55.00
    1   0.21    Premium SI1      0       61.00
    2   0.23    Good    VS1      0       65.00
    3   0.29    Premium VS2      0       58.00
    4   0.31    Good    SI2      0       58.00
    5   0.24    Good    VVS2     3       62.80
    

    Or use qcut for equal-sized buckets:

    def binning_fun(df, col_name, num_of_bins):
        df[col_name]=pd.qcut(df[col_name], q=num_of_bins, labels=range(num_of_bins))
        return df
    
    df=pd.read_csv("C:/Users/shootings.csv")
    binning_fun(df, 'depth', 4)
    

    Output:

        carat   cut     clarity depth   table
    0   0.23    Ideal   SI2     1       55.00
    1   0.21    Premium SI1     0       61.00
    2   0.23    Good    VS1     0       65.00
    3   0.29    Premium VS2     2       58.00
    4   0.31    Good    SI2     3       58.00
    5   0.24    Good    VVS2    3       62.80
    

    I hope this helps.