pythonclassificationvoting

Voting between classifiers with different training data samples


I have a dataset with 25 features, 36000 samples, and 11 labels. In python I want to train some classifiers (with same type or not) with different samples of this dataset. In other words, the training data samples of each classifier is different from the training data samples of another model. Of course labels and features are same and the difference is about samples. Then I want to vote between trained classifiers. Existed functions of python do voting with same training data. I really appreciate if you can help me to solve the issue in python.

I have tried to use voting functions of python, but unfortunately these functions accept same training data for all base classifiers.


Solution

  • Get the predictions by all your classifiers and append them to a list:

    on_hot_encoded = True  # Change this to False if your models don not produce one-hot encoded predictions.
    outputs = []
    for i in range(n_models):
        preds = models[i].predict(x_test)
        # If the predictions are one-hot enocded, this line of the code will convert them into categorical classes
        if one_hot_encoded:
            preds /= np.max(preds , axis=1).reshape(-1,1)
        outputs.append(preds)
    
    outputs = np.array(outputs).T
    utlimate = [max(set(list(row)), key = list(row).count) for row in outputs]
    

    For your example:

    import numpy as np
    y1 = [1,2,3,0,1,2,1,1,0]
    y2 = [2,1,3,1,0,1,2,1,0]
    y3 = [1,2,0,1,0,2,1,1,3]
    y=[]
    y.append(y1)
    y.append(y2)
    y.append(y3)
    y = np.array(y).T
    ultimate = [max(set(list(row)), key=list(row).count) for row in y]
    print(ultimate)
    

    The output:

    This is the output I got from the above code