python-3.xmachine-learningscikit-learndata-manipulation

Getting counts in MultiLabelBinarizer


How can I get counts of items in MultiLabelBinarizer?

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()

pd.DataFrame(mlb.fit_transform([(1,1,2), (3,3,2,5)]),columns=mlb.classes_)

Out[0]: 
   1  2  3  5
0  1  1  0  0
1  0  1  1  1

Instead of this, I want to get

Out[0]: 
   1  2  3  5
0  2  1  0  0
1  0  1  2  1

As 1 is repeated 2 times in row 1 and 3 is repeated 2 times in row 2


Solution

  • from collections import Counter
    
    data = [(1,1,2), (3,3,2,5)]
    pd.DataFrame([Counter(x) for x in data]).fillna(0)
    

    Output:

        1       2   3       5
    0   2.0     1   0.0     0.0
    1   0.0     1   2.0     1.0