pythonmachine-learningscikit-learnclassification

How to make an ensemble for two binary classifiers?


I have two classifiers for a multimedia dataset. One for visual material and one for textual material. I want to combine the predictions of these classifiers to make a final prediction. I have been reading about bagging, boosting and stacking ensembles and all seem useful and I would like to try them. However, I can only seem to find rather theoretical examples for my specific problem, nothing concrete enough for me to understand how to actually implement it (in python with scikit-learn). My two classifiers both use 10 KFold CV with SVM classification. Both outputting a list of n_samples = 1000 with predictions (either 1's or 0's). Also, I made them both produce a list of probabilities on which the predictions are based, looking like this:

 [[ 0.96761819  0.03238181]
 [ 0.96761819  0.03238181]
  ....
 [ 0.96761819  0.03238181]
 [ 0.96761819  0.03238181]]

How would I go about combining these in an ensemble. What should I use as input? Ive tried concatenating the label predictions horizontally and input them as features, but with no luck (same for the probabilities).


Solution

  • If you're looking for combining strictly, I recomend using brew because it is built on top of sklearn (meaning that you can use your sklearn classifiers), and, last time I checked, sklearn was good for creating ensembles (Bagging, AdaBoost, RandomForest ...), but not many combining rules were provided for your own custom ensemble (such as hybrid ensembles).

    https://github.com/viisar/brew

    from brew.base import Ensemble
    from brew.base import EnsembleClassifier
    from brew.combination.combiner import Combiner
    
    # create your Ensemble
    clfs = your_list_of_classifiers # [clf1, clf2]
    ens = Ensemble(classifiers = clfs)
    
    # create your Combiner
    # the rules can be 'majority_vote', 'max', 'min', 'mean' or 'median'
    comb = Combiner(rule='mean')
    
    # now create your ensemble classifier
    ensemble_clf = EnsembleClassifier(ensemble=ens, combiner=comb)
    ensemble_clf.predict(X)