python-3.xauto-sklearn

label_binarize not outputting the correct number of classes


When I use label_binarize I do not get the correct number of classes even though I specify it. This is my simple code:

import numpy as np
from sklearn.preprocessing import label_binarize

y = ['tap', 'not_tap', 'tap', 'tap', 'not_tap', 'tap', 'not_tap','not_tap']

y = label_binarize(y, classes=[0, 1])
n_classes = y.shape[1]

I get n_classes= 1. While using this code, I get the warning message:

FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  mask |= (ar1 == a)

Can you tell me how to correctly get n_classes = 2 as in this example?

Thank you!


Solution

  • label_binarize binarizes the values in a one-vs-all fashion

    Consider this example

    from sklearn.preprocessing import label_binarize
    print(label_binarize([1, 6], classes=[1, 2, 4, 6]))
    
    [[1 0 0 0]
    [0 0 0 1]]
    

    The columns are the classes [1,2,4,6] and 1 denotes if the value matches the class or not.

    The way you're invoking it now (label_binarize(y, classes=[0, 1])), none of the values (tap,no_tap) match any of the classes (0,1) and hence all values are 0.

    What you're looking for is a LabelBinarizer

    from sklearn.preprocessing import LabelBinarizer
    
    y = ['tap', 'not_tap', 'tap', 'tap', 'not_tap', 'tap', 'not_tap','not_tap']
    lb = LabelBinarizer()
    
    label = lb.fit_transform(y)
    [[1]
    [0]
    [1]
    [1]
    [0]
    [1]
    [0]
    [0]]
    
    n_classes = len(lb.classes_)
    #2